[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] job status from python



Thanks so much. This is extremely helpful to me. I have code for all
this implemented and it's almost working - I just have 1 issue - when
I try and remove 1 job from the queue they all get removed.

This is my queue:

$ condor_q -all

-- Schedd: bach.elucid.local : <192.168.10.2:9618?... @ 01/11/18 22:03:37
OWNER     BATCH_NAME                   SUBMITTED   DONE   RUN    IDLE
TOTAL JOB_IDS
prod_user CMD: compute_radiology.py   1/11 21:32      _      _      _
   87 1739.0 ... 1825.0

87 jobs; 87 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended

Then I run this:

schedd.act(htcondor.JobAction.Remove, '1763')

which returns:

[
        TotalJobAds = 87;
        TotalPermissionDenied = 0;
        TotalAlreadyDone = 0;
        TotalNotFound = 0;
        TotalSuccess = 87;
        TotalChangedAds = 1;
        TotalBadStatus = 0;
        TotalError = 0
    ]

and then the queue is empty:

$ condor_q -all

-- Schedd: bach.elucid.local : <192.168.10.2:9618?... @ 01/11/18 22:04:32
OWNER BATCH_NAME      SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS

0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended

How can I just remove the one job from the queue?

Thanks!

On Tue, Jan 9, 2018 at 9:34 AM, Jason Patton <jpatton@xxxxxxxxxxx> wrote:
> For looking up jobs that have finished, there is
> htcondor.Schedd().history(expression, projection), where you could
> query for something like history(expression = 'ClusterId == 123',
> projection = []) to get the ClassAd of job 123 *if* it has cleared the
> queue. However, querying the history is ***very slow***.
>
> Two better options:
> 1) Parse and watch the job log file, if your script can get to it. The
> job log file will update when the job has started running, update
> occasionally with resource usage, and update with the exit code when
> it has finished.
>
> 2) Leave the job in the queue when it's completed and have your script
> remove it:
>
> In your Submit objects, set { 'leave_in_queue': '(JobStatus == 4)' },
> which means when a job has completed, leave it in the queue. See
> https://research.cs.wisc.edu/htcondor/manual/current/12_Appendix_A.html
> to see what the value of JobStatus means.
>
> Query for jobs' ClassAds using htcondor.Schedd().xquery().
>
> If JobStatus == 4 and/or if ExitCode is defined, then you know the job
> is done. Remove it from the queue by sending a
> htcondor.Schedd().act(htcondor.JobAction.Remove, str(ClusterId)). (You
> can also send a  list of ClusterIds.)
>
>
> Constantly querying the Schedd probably won't scale well with the size
> of the queue, so if you can use the job log, that's probably the
> better of the two solutions.
>
> Jason
>
> On Jan 8, 2018 5:24 PM, "Larry Martell" <larry.martell@xxxxxxxxx> wrote:
>>
>> Is there any python API for checking the status of jobs?
>>
>> On Sun, Jan 7, 2018 at 3:47 PM Larry Martell <larry.martell@xxxxxxxxx> wrote:
>>>
>>> I am submitting jobs like this:
>>>
>>>     sub = htcondor.Submit(submit_dict)
>>>     with schedd.transaction() as txn:
>>>         id = sub.queue(txn)
>>>
>>> Now I want to be able to tell if the job has completed or not, and
>>> when it has completed, if it succeeded or failed. How can I do that?