[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] htcondor python api



htcondor.JobEventLog(path_to_logfile) is indeed what you want to use here:
https://htcondor.readthedocs.io/en/latest/apis/python-bindings/api/htcondor.html#htcondor.JobEventLog

In case it's not clear from the condor_watch_q code, what you want to do is put an inner "for event in jel.events(stop_after=1)" loop inside an outer loop, then break from the outer loop after accumulating the number of htcondor.JobEventType.JOB_TERMINATED events that you expect to see (or after some timeout period).

The condor_watch_q code can be a little heavy, so I whipped up a simple function that I think accomplishes what you want to do:

import htcondor
import time

def wait_for_job(logfile, num_jobs, timeout=None):
    start = time.time()
    completed = 0
    jel = htcondor.JobEventLog(my_log_file)
    while True:
        for event in jel.events(stop_after=0):
            completed += int(event.type == htcondor.JobEventType.JOB_TERMINATED)
            if event.type in {  # catch some non-termination events that halt job progress        
                    htcondor.JobEventType.JOB_ABORTED,
                    htcondor.JobEventType.JOB_HELD,
                    htcondor.JobEventType.CLUSTER_REMOVE,
                }:
                raise RuntimeError("A job was aborted, held, or removed")
        if completed >= my_job_count:  # jobs completed                                            
            break
        if timeout is not None and (time.time() - start) > timeout:
            raise RuntimeError("Timed out waiting for job to complete")
        time.sleep(1)  # wait one second before polling again


This is certainly not perfect by any means (there are other events you might want to raise an exception on, or maybe you don't want to raise an exception at all), but hopefully it gets the idea across.

Jason Patton

On Thu, Apr 22, 2021 at 8:53 AM <htcondor-users@xxxxxxxxxxx> wrote:
Hi,

Unfortunately there isn't a native way in the Python bindings to wait on a job to complete. You should check out the JobStateTracker class from condor_watch_q for examples of how to parse the event logs: https://github.com/htcondor/htcondor/blob/master/src/condor_scripts/condor_watch_q#L831

- Brian

On 4/22/21 7:58 AM, rmorgan466@xxxxxxxxx wrote:
using the python API, i can submit a job but is there a way to wait until the job is completed?

Ther eis htcondor.JobEvengLog("logfile") but I am not sure how to really use that to wait until a task is completed.

--
--- Get your facts first, then you can distort them as you please.--

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/