[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to forbid job restarts



Hello, not sure that helps your case, anyway:

In the dag file one can specify:

JOB A A.sub
RETRY A 5  # see also UNLESS_EXIT: retry on some exit codes only

Probably RETRY A 0 would disable restarts (in case of DAG jobs).

For general jobs i've set the following job transform rule in the schedd:

JOB_TRANSFORM_NoRestart @=end
   REQUIREMENTS True
   if defined My.Requirements
      SET Requirements (NumJobStarts == 0) && ( $(My.Requirements) )
   else
      SET Requirements (NumJobStarts == 0)
   endif
@end

SYSTEM_PERIODIC_HOLD = ( $(SYSTEM_PERIODIC_HOLD:False) || (NumJobStarts == 1 && JobStatus == 1) )
SYSTEM_PERIODIC_REMOVE = (JobStatus == 5 && CurrentTime - EnteredCurrentStatus > 3600*6)

Stefano


Il 11/01/21 19:18, Vaurynovich, Siarhei ha scritto:

 

Hello,

 

After an extensive web-search, I do not seem to find an answer to a simple question: how do I forbit HTCondor to restart my jobs?

 

I have a type of jobs, which I used to ran as independent jobs and they were always allowed to finish by HTCondor. I have upgraded the process to be more efficient in theory by running those jobs as a DAG, which consists of multiple (hundreds) of independent graphs (i.e. no parent/child links between them). And now, HTCondor does not allow the jobs to finish since its keeps restarting (after about an hour of running) them before they could complete (NumJobStarts keeps incrementing and the run time of a job as seen in Linux top keeps being reset to zero).

 

How can I tell HTCondor that it is forbidden to restart jobs and all the jobs should be allowed to finish no matter how long it takes?

 

What could be the reason the jobs started to restart execution periodically when run as part of a DAG?

 

I am the administrator of my HTCondor cluster, so I am sure that nether HTCondor configuration parameters were changed, nor the individual job submit files were changed.

 

Thank you very much for your help,

Siarhei.

 

............................................................................

Trading instructions sent electronically to Bernstein shall not be deemed
accepted until a representative of Bernstein acknowledges receipt
electronically or by telephone.  Comments in this e-mail transmission and
any attachments are part of a larger body of investment analysis. For our
research reports, which contain information that may be used to support
investment decisions, and disclosures see our website at
www.bernsteinresearch.com.

For further important information about AllianceBernstein please click here
http://www.alliancebernstein.com/disclaimer/email/disclaimer.html


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/