[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] job status from python



Ack, yes, that's it...

"It can either be a list of job IDs or a string specifying a
constraint to match jobs."

For some reason, I had interpreted that as "or a string with the job ID".

Jason

On Fri, Jan 12, 2018 at 8:58 AM, John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
> I suspect it's treating '1763' as an unusual form of writing true. Try
> 'clusterid==1763' instead.
>
>
> sent from my phone
>
> ________________________________
> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Larry
> Martell <larry.martell@xxxxxxxxx>
> Sent: Thursday, January 11, 2018 9:07:31 PM
> To: HTCondor-Users Mail List
> Subject: Re: [HTCondor-users] job status from python
>
> Thanks so much. This is extremely helpful to me. I have code for all
> this implemented and it's almost working - I just have 1 issue - when
> I try and remove 1 job from the queue they all get removed.
>
> This is my queue:
>
> $ condor_q -all
>
> -- Schedd: bach.elucid.local : <192.168.10.2:9618?... @ 01/11/18 22:03:37
> OWNER     BATCH_NAME                   SUBMITTED   DONE   RUN    IDLE
> TOTAL JOB_IDS
> prod_user CMD: compute_radiology.py   1/11 21:32      _      _      _
>    87 1739.0 ... 1825.0
>
> 87 jobs; 87 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
>
> Then I run this:
>
> schedd.act(htcondor.JobAction.Remove, '1763')
>
> which returns:
>
> [
>         TotalJobAds = 87;
>         TotalPermissionDenied = 0;
>         TotalAlreadyDone = 0;
>         TotalNotFound = 0;
>         TotalSuccess = 87;
>         TotalChangedAds = 1;
>         TotalBadStatus = 0;
>         TotalError = 0
>     ]
>
> and then the queue is empty:
>
> $ condor_q -all
>
> -- Schedd: bach.elucid.local : <192.168.10.2:9618?... @ 01/11/18 22:04:32
> OWNER BATCH_NAME      SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS
>
> 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
>
> How can I just remove the one job from the queue?
>
> Thanks!
>
> On Tue, Jan 9, 2018 at 9:34 AM, Jason Patton <jpatton@xxxxxxxxxxx> wrote:
>> For looking up jobs that have finished, there is
>> htcondor.Schedd().history(expression, projection), where you could
>> query for something like history(expression = 'ClusterId == 123',
>> projection = []) to get the ClassAd of job 123 *if* it has cleared the
>> queue. However, querying the history is ***very slow***.
>>
>> Two better options:
>> 1) Parse and watch the job log file, if your script can get to it. The
>> job log file will update when the job has started running, update
>> occasionally with resource usage, and update with the exit code when
>> it has finished.
>>
>> 2) Leave the job in the queue when it's completed and have your script
>> remove it:
>>
>> In your Submit objects, set { 'leave_in_queue': '(JobStatus == 4)' },
>> which means when a job has completed, leave it in the queue. See
>> https://research.cs.wisc.edu/htcondor/manual/current/12_Appendix_A.html
>> to see what the value of JobStatus means.
>>
>> Query for jobs' ClassAds using htcondor.Schedd().xquery().
>>
>> If JobStatus == 4 and/or if ExitCode is defined, then you know the job
>> is done. Remove it from the queue by sending a
>> htcondor.Schedd().act(htcondor.JobAction.Remove, str(ClusterId)). (You
>> can also send a  list of ClusterIds.)
>>
>>
>> Constantly querying the Schedd probably won't scale well with the size
>> of the queue, so if you can use the job log, that's probably the
>> better of the two solutions.
>>
>> Jason
>>
>> On Jan 8, 2018 5:24 PM, "Larry Martell" <larry.martell@xxxxxxxxx> wrote:
>>>
>>> Is there any python API for checking the status of jobs?
>>>
>>> On Sun, Jan 7, 2018 at 3:47 PM Larry Martell <larry.martell@xxxxxxxxx>
>>> wrote:
>>>>
>>>> I am submitting jobs like this:
>>>>
>>>>     sub = htcondor.Submit(submit_dict)
>>>>     with schedd.transaction() as txn:
>>>>         id = sub.queue(txn)
>>>>
>>>> Now I want to be able to tell if the job has completed or not, and
>>>> when it has completed, if it succeeded or failed. How can I do that?
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/