[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] job status from python



Thanks John and Jason.

On Fri, Jan 12, 2018 at 10:01 AM, Jason Patton <jpatton@xxxxxxxxxxx> wrote:
> Ack, yes, that's it...
>
> "It can either be a list of job IDs or a string specifying a
> constraint to match jobs."
>
> For some reason, I had interpreted that as "or a string with the job ID".
>
> Jason
>
> On Fri, Jan 12, 2018 at 8:58 AM, John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
>> I suspect it's treating '1763' as an unusual form of writing true. Try
>> 'clusterid==1763' instead.
>>
>>
>> sent from my phone
>>
>> ________________________________
>> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Larry
>> Martell <larry.martell@xxxxxxxxx>
>> Sent: Thursday, January 11, 2018 9:07:31 PM
>> To: HTCondor-Users Mail List
>> Subject: Re: [HTCondor-users] job status from python
>>
>> Thanks so much. This is extremely helpful to me. I have code for all
>> this implemented and it's almost working - I just have 1 issue - when
>> I try and remove 1 job from the queue they all get removed.
>>
>> This is my queue:
>>
>> $ condor_q -all
>>
>> -- Schedd: bach.elucid.local : <192.168.10.2:9618?... @ 01/11/18 22:03:37
>> OWNER     BATCH_NAME                   SUBMITTED   DONE   RUN    IDLE
>> TOTAL JOB_IDS
>> prod_user CMD: compute_radiology.py   1/11 21:32      _      _      _
>>    87 1739.0 ... 1825.0
>>
>> 87 jobs; 87 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
>>
>> Then I run this:
>>
>> schedd.act(htcondor.JobAction.Remove, '1763')
>>
>> which returns:
>>
>> [
>>         TotalJobAds = 87;
>>         TotalPermissionDenied = 0;
>>         TotalAlreadyDone = 0;
>>         TotalNotFound = 0;
>>         TotalSuccess = 87;
>>         TotalChangedAds = 1;
>>         TotalBadStatus = 0;
>>         TotalError = 0
>>     ]
>>
>> and then the queue is empty:
>>
>> $ condor_q -all
>>
>> -- Schedd: bach.elucid.local : <192.168.10.2:9618?... @ 01/11/18 22:04:32
>> OWNER BATCH_NAME      SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS
>>
>> 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
>>
>> How can I just remove the one job from the queue?
>>
>> Thanks!
>>
>> On Tue, Jan 9, 2018 at 9:34 AM, Jason Patton <jpatton@xxxxxxxxxxx> wrote:
>>> For looking up jobs that have finished, there is
>>> htcondor.Schedd().history(expression, projection), where you could
>>> query for something like history(expression = 'ClusterId == 123',
>>> projection = []) to get the ClassAd of job 123 *if* it has cleared the
>>> queue. However, querying the history is ***very slow***.
>>>
>>> Two better options:
>>> 1) Parse and watch the job log file, if your script can get to it. The
>>> job log file will update when the job has started running, update
>>> occasionally with resource usage, and update with the exit code when
>>> it has finished.
>>>
>>> 2) Leave the job in the queue when it's completed and have your script
>>> remove it:
>>>
>>> In your Submit objects, set { 'leave_in_queue': '(JobStatus == 4)' },
>>> which means when a job has completed, leave it in the queue. See
>>> https://research.cs.wisc.edu/htcondor/manual/current/12_Appendix_A.html
>>> to see what the value of JobStatus means.
>>>
>>> Query for jobs' ClassAds using htcondor.Schedd().xquery().
>>>
>>> If JobStatus == 4 and/or if ExitCode is defined, then you know the job
>>> is done. Remove it from the queue by sending a
>>> htcondor.Schedd().act(htcondor.JobAction.Remove, str(ClusterId)). (You
>>> can also send a  list of ClusterIds.)
>>>
>>>
>>> Constantly querying the Schedd probably won't scale well with the size
>>> of the queue, so if you can use the job log, that's probably the
>>> better of the two solutions.
>>>
>>> Jason
>>>
>>> On Jan 8, 2018 5:24 PM, "Larry Martell" <larry.martell@xxxxxxxxx> wrote:
>>>>
>>>> Is there any python API for checking the status of jobs?
>>>>
>>>> On Sun, Jan 7, 2018 at 3:47 PM Larry Martell <larry.martell@xxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> I am submitting jobs like this:
>>>>>
>>>>>     sub = htcondor.Submit(submit_dict)
>>>>>     with schedd.transaction() as txn:
>>>>>         id = sub.queue(txn)
>>>>>
>>>>> Now I want to be able to tell if the job has completed or not, and
>>>>> when it has completed, if it succeeded or failed. How can I do that?
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/