[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Possible to fail on hold?



Is there some way that I can get jobs that are put into a hold state to simply fail? Or, failing that, is there a PeriodicRelease _expression_ I can use that will force a failure after a given number of HoldRelease cycles? It seems that NumShadowStarts would be close to what I want, but it isnât exactly the number of times the job started execution. Perhaps itâs close enough?

 

Iâve got a situation with a very long running job that was being killed due to use of too much memory. It was never going to succeed, but the default PeriodicRelease _expression_ used meant that it was retried up to 20 times. The PeriodicRelease _expression_ that is our current default is:

 

                PeriodicRelease = ((JobStatus==5) && (CurrentTime - EnteredCurrentStatus) > 30)

 

Iâm not clear how this ever allows the job to fail, but empirically, it does after a variable number of cycles. Anyway, Iâm uncomfortable with what appears to be a potential infinite number of restarts this impliesâ

 

Thank You!

 

John

 

John Calley, Ph.D.

Research Advisor

Genomics and Bioinformatics,

Clinical Laboratory Sciences, CDDA, LRL

Eli Lilly and Company

Lilly Corporate Center, Indianapolis, IN 46285 USA

317.433-3399 (office)
calley_john_n@xxxxxxxxx | www.lilly.com 

CONFIDENTIALITY NOTICE:  This e-mail message (including all attachments) is for the sole use of the intended recipient(s) and may contain confidential and privileged information.  Any unauthorized review, use, disclosure, copying or distribution is strictly prohibited.  If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.