[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] capturing job classad in DAG POST script



I would like to capture the classad for a job that has just completed when my DAG POST script runs.  The problem I have is that the classad is usually no longer available via condor_q -l $ClusterId.  Is my only solution to add a "sleep X" statement, where X is suitably long for "condor_history -l $ClusterId" to work?

And in the "condor_history" classad, how much of the information about where the job ran will still be available?  FWIW, I'm running jobs in the "grid/gt2" universe as part of Open Science Grid.  What I'm looking for, in particular, are details about where failed jobs were trying to run.  I can also pull this info out of the job log files, but because I run large DAGs, I have a single log file for the DAG, and a single shared log file for all DAG node jobs -- it is difficult to quickly pull out failure information from this, and would be much nicer if my POST script could capture this information quickly and record it to a "failed job log".  Using the classad is my idea for how to capture this information.

If "sleep X" is my only option, what is a reasonable value of X, for a system where there are perhaps 6000 queued jobs, and jobs completing at a rate of about once every 10 seconds.

Thanks,

Ian
-- 
Ian Stokes-Rees, PhD                       W: http://hkl.hms.harvard.edu
ijstokes@xxxxxxxxxxxxxxxxxxx               T: +1 617 432-5608 x75
NEBioGrid, Harvard Medical School          C: +1 617 331-5993

begin:vcard
fn:Ian Stokes-Rees, PhD
n:Stokes-Rees;Ian
org:Harvard Medical School;Biological Chemistry and Molecular Pharmacology
adr;dom:;;250 Longwood Ave;Boston;MA;02115
email;internet:ijstokes@xxxxxxxxxxxxxxxxxxx
title:Research Associate, Sliz Lab
tel;work:+1 617 432-5608 x75
tel;fax:+1 617 432-5600
tel;cell:+1 617 331-5993
url:http://hkl.hms.harvard.edu
version:2.1
end:vcard