[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] How to forbid job restarts



 

Hello,

 

After an extensive web-search, I do not seem to find an answer to a simple question: how do I forbit HTCondor to restart my jobs?

 

I have a type of jobs, which I used to ran as independent jobs and they were always allowed to finish by HTCondor. I have upgraded the process to be more efficient in theory by running those jobs as a DAG, which consists of multiple (hundreds) of independent graphs (i.e. no parent/child links between them). And now, HTCondor does not allow the jobs to finish since its keeps restarting (after about an hour of running) them before they could complete (NumJobStarts keeps incrementing and the run time of a job as seen in Linux top keeps being reset to zero).

 

How can I tell HTCondor that it is forbidden to restart jobs and all the jobs should be allowed to finish no matter how long it takes?

 

What could be the reason the jobs started to restart execution periodically when run as part of a DAG?

 

I am the administrator of my HTCondor cluster, so I am sure that nether HTCondor configuration parameters were changed, nor the individual job submit files were changed.

 

Thank you very much for your help,

Siarhei.

 

............................................................................

Trading instructions sent electronically to Bernstein shall not be deemed
accepted until a representative of Bernstein acknowledges receipt
electronically or by telephone.  Comments in this e-mail transmission and
any attachments are part of a larger body of investment analysis. For our
research reports, which contain information that may be used to support
investment decisions, and disclosures see our website at
www.bernsteinresearch.com.

For further important information about AllianceBernstein please click here
http://www.alliancebernstein.com/disclaimer/email/disclaimer.html