I would like to capture the classad for a job that
has just completed when my DAG POST script runs. The problem I have is
that the classad is usually no longer available via condor_q -l
$ClusterId. Is my only solution to add a "sleep X" statement, where X
is suitably long for "condor_history -l $ClusterId" to work?|
And in the "condor_history" classad, how much of the information about where the job ran will still be available? FWIW, I'm running jobs in the "grid/gt2" universe as part of Open Science Grid. What I'm looking for, in particular, are details about where failed jobs were trying to run. I can also pull this info out of the job log files, but because I run large DAGs, I have a single log file for the DAG, and a single shared log file for all DAG node jobs -- it is difficult to quickly pull out failure information from this, and would be much nicer if my POST script could capture this information quickly and record it to a "failed job log". Using the classad is my idea for how to capture this information.
If "sleep X" is my only option, what is a reasonable value of X, for a system where there are perhaps 6000 queued jobs, and jobs completing at a rate of about once every 10 seconds.
-- Ian Stokes-Rees, PhD W: http://hkl.hms.harvard.edu ijstokes@xxxxxxxxxxxxxxxxxxx T: +1 617 432-5608 x75 NEBioGrid, Harvard Medical School C: +1 617 331-5993
begin:vcard fn:Ian Stokes-Rees, PhD n:Stokes-Rees;Ian org:Harvard Medical School;Biological Chemistry and Molecular Pharmacology adr;dom:;;250 Longwood Ave;Boston;MA;02115 email;internet:ijstokes@xxxxxxxxxxxxxxxxxxx title:Research Associate, Sliz Lab tel;work:+1 617 432-5608 x75 tel;fax:+1 617 432-5600 tel;cell:+1 617 331-5993 url:http://hkl.hms.harvard.edu version:2.1 end:vcard