[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Dagman and PeriodicRemove




On Jun 3, 2010, at 18:18 , R. Kent Wenger wrote:

On Thu, 3 Jun 2010, Peter Doherty wrote:

I have a Condor DAG, and I have a PeriodicRemove statement in my classad that times out jobs after 30 minutes. I have Retry = 1 in my *.dag file to retry jobs that fail, either with a non-zero exit code, or that fail the POST script.

But if the job is removed by the PeriodicRemove statement, I don't want to run the POST script, and I don't want the job retried. I just want it removed, and nothing more done with it.

Can I make condor do this?

There's no way to avoid running the POST script. However, if the POST script can recognize that the job was removed by the PeriodicRemove expression, you can avoid retrying the node. You can do this by having the POST script return a "special" exit code in that case, and using the
UNLESS-EXIT option in your retry statements:

 RETRY JobName NumberOfRetries UNLESS-EXIT value


Thanks Kent. I was able to make your suggestion work. I just had the POST script grep through the job's log file to find out if it was removed due to PeriodicRemove. I wasn't sure if there was a more elegant way of doing this.

Peter