[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Determining DAGman retry number



Hi Greg, Kent,

Is there a way to get the DAGman retry number for a job from it's ClassAd? I want the number that's printed in dagman.out:

02/26/17 20:54:20 Retrying node inspiral-IMBHIMRPHENOMD_INJ-H1_ID513_ID0091456 (retry #1 of 3)...

NumJobRestarts doesn't work for me, as a retried job has a different condor id. I want to make a ClassAd that controls submission site and depends on the number of times a job has been retried.

Cheers,
Duncan.

p.s. I looked for the obvious:

[dbrown@sugwg-osg ~]$ condor_q -long 6659239 | grep -i dag
DAGManJobId = 6654561
DAGManNodesLog = "/usr1/dbrown/pycbc-tmp.Bd34wWMKf1/work/o2-analysis-5-c00-v1.6.7.1-main_ID0000001.000/./o2-analysis-5-c00-v1.6.7.1-main-0.dag.nodes.log"
DAGManNodesMask = "0,1,2,4,5,7,9,10,11,12,13,16,17,24,27"
DAGNodeName = "inspiral-NSBHSEOBNRV4_INJ-L1_ID194_ID0036993"
DAGParentNodeNames = "strip_injections-NSBHSEOBNRV4_INJ-H1L1_ID192_ID0030043,splitbank_ID6_ID0000010"
Environment = "PEGASUS_SITE=osg PEGASUS_DAG_JOB_ID=inspiral-NSBHSEOBNRV4_INJ-L1_ID194_ID0036993 PEGASUS_WF_LABEL=o2-analysis-5-c00-v1.6.7.1-main LAL_DATA_PATH=/cvmfs/oasis.opensciencegrid.org/ligo/sw/pycbc/lalsuite-extra/11/share/lalsimulation CONDOR_JOBID=6659239.0 PEGASUS_WF_UUID=e610dbe7-0cf7-47ac-93ed-a610812a12ca NO_TMPDIR=1"
JobBatchName = "o2-analysis-5-c00-v1.6.7.1-0.dag+6654560"
pegasus_wf_dag_job_id = "inspiral-NSBHSEOBNRV4_INJ-L1_ID194_ID0036993"
SubmitEventNotes = "DAG Node: inspiral-NSBHSEOBNRV4_INJ-L1_ID194_ID0036993"
[dbrown@sugwg-osg ~]$ condor_q -long 6659239 | grep -i retry
[dbrown@sugwg-osg ~]$ condor_q -long 6659239 | grep -i sub
AutoClusterAttrs = "CheckpointPlatform,JobUniverse,LastCheckpointPlatform,NumCkpts,DESIRED_Sites,is_itb,REQUIRED_OS,desired_arch,MachineLastMatchTime,DynamicSlot,PartitionableSlot,Slot1_SelfMonitorAge,_condor_RequestCpus,_condor_RequestDisk,RequestCpus,Slot1_TotalTimeClaimedBusy,WithinResourceLimits,DESIRED_XSEDE_Sites,ResidentSetSize,Slot1_TotalTimeUnclaimedIdle,ConcurrencyLimits,NiceUser,Rank,Requirements,DGA_FM,USER_COMMUNITY,GABE,NRG_NODE,CRG_NODE,_condor_RequestGPUs,RequestGPUs,GPU_NODE,SUBMIT_HOST,User,RemoteUser,InitialRequestMemory,NumJobStarts,OpenScienceGrid,Owner,RequestDisk,RequestMemory"
SUBMIT_HOST = "sugwg-osg.phy.syr.edu"
SubmitEventNotes = "DAG Node: inspiral-NSBHSEOBNRV4_INJ-L1_ID194_ID0036993"
SubmitEventUserNotes = "pool:osg"
TotalSubmitProcs = 1
User = "dbrown@xxxxxxxxxxxxxxxxxxxxxxxxx"
x509userproxysubject = "/DC=org/DC=cilogon/C=US/O=LIGO/CN=Duncan Brown duncan.brown@xxxxxxxx"
[dbrown@sugwg-osg ~]$ condor_q -long 6659239 | grep -i try
JOB_GLIDEIN_Entry_Name = "$$(GLIDEIN_Entry_Name:Unknown)"
JobAdInformationAttrs = "JOB_Site JOB_GLIDEIN_Entry_Name JOB_GLIDEIN_Name JOB_GLIDEIN_Factory JOB_GLIDEIN_Schedd JOB_GLIDEIN_ClusterId JOB_GLIDEIN_ProcId JOB_GLIDEIN_Site JOB_GLIDEIN_SiteWMS JOB_GLIDEIN_SiteWMS_Slot JOB_GLIDEIN_SiteWMS_JobId JOB_GLIDEIN_SiteWMS_Queue"



-- 

Duncan Brown                         http://dbrown10.expressions.syr.edu
Charles Brightman Professor of Physics     Room 263-1 Physics Department
Director of the Graduate Program      Syracuse University, NY 13244, USA
Phone: 315 443 5993                                    Fax: 315 443 9103