[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] post job hook



Hi Keith,

There's the new python LogReader API we've been kicking around to process events in the schedd log (new to 8.3.4).  If youâre a python-oriented person, you might find it useful.

Here's a simple script:

""""
import os
import htcondor

log_reader = htcondor.LogReader(os.path.join(htcondor.param["SPOOL"], "job_queue.log"))
log_reader.setBlocking(True)

for event in log_reader:
    if event['event'] == htcondor.EntryType.SetAttribute and event['name'] == "JobStatus" and event['value'] == 4:
        print "Job %s left queue." % event['key']
"""

A few notes:
- It's a young API; I noticed there's a few raw edges (constructor should use the job queue by default; no exception thrown if you give an invalid name).
- As it tails the schedd's job queue (which contains sensitive information), it must run as a privileged process (file is owned by 'condor' and mode 600).
- The above code processes the whole queue each time it is started.  So, youâll need to know where you left off if you donât want to go through the whole thing again.
- As with any event-based API, you want to periodically re-sync your knowledge of the job queue to avoid being hit by some unknown bug that misses events.
- Events ("set attribute", "new cluster", "destroy cluster", etc) occur independently.  So, if you want to know something about the job (for example, some value of an attribute when the job completed), you have to track this yourself.  Further, there's no guarantee of the ordering of the attribute changes.  For example, if you wanted to see how much accumulated wall time the job had when it exited, you couldn't use the above code.  You'd need:

"""
import os
import htcondor

log_reader = htcondor.LogReader(os.path.join(htcondor.param["SPOOL"], "job_queue.log"))
log_reader.setBlocking(True)

job_durations = {}
for event in log_reader:
    if event['event'] == htcondor.EntryType.SetAttribute:
        if event['name'] == 'RemoteWallClockTime':
            job_durations[event['key']] = event['value']
    if event['event'] == htcondor.EntryType.DestroyClassAd and event['key'] in job_durations:
        print "Job %s left queue after %d seconds of wall time." % (event['key'], job_durations[event['key']].eval())
"""

This is because, from the log reader, the JobStatus may be set to 4 before the RemoteWallClockTime is set.

On Linux, the LogReader API is inotify-based, meaning that the latency to event notification is relatively low.

So, still a pretty raw API - but itâs going to be efficient compared to most other approaches.  Iâm not currently planning on doing a high-level API, but welcome ideas.

Hope this helps,

Brian

PS - if you want to do this unprivileged, you can use the EventIterator to watch over a job log.  On Linux, both EventIterator and LogReader can return a file descriptor which will be marked as readable when new events are ready - this allows for an efficient select() based loop to watch multiple files.

PPS - CCâing a few folks explicitly who might be interested in the API.

> On Mar 21, 2015, at 8:15 AM, Keith Brown <keith6014@xxxxxxxxx> wrote:
> 
> is there any mechanism to execute a post job routine after every job?
> I am trying to get realtime job finish status on many schedulers
> (~30).  I can look at condor_history logs but ideally i would like to
> post to a website when the user's job is finished.
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/