[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Force dag to continue after periodic_remove



Hi, 

I have approximately 20 condor jobs (A1... A20) which are the parents of about 600 jobs (B1... B600) inside of a *.dag file. I do not need all of the 20 jobs to complete successfully, however I do need the ones that do succeed to do so before jobs B1-B600 are started. I know that the 20 jobs should take about 10 minutes to complete and I know that if the job is taking any longer it will probably never complete. Thus I added 

periodic_remove = CurrentTime-EnteredCurrentStatus > 1200

to the end of the *.sub files for these jobs, so that any job that did not complete after 20 minutes would be removed. However my *.dag file also stops at this point and does not continue onto the 600 subsequent jobs because it sees this removed job as a failure. Thus I added 

== 1)

to the *.sub files for the 20 jobs. I was trying to force a job that was removed to be as seen as a success to the *.dag file, but this addition to the *.sub files caused my 20 jobs to keep recycling in the queue and none of them were completed. 

Is there a way to force periodic_remove or something similar, to act like a success in the eyes of the dag, so that subsequent jobs get completed?

Cheers,
Ash