[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Jobs restarting

Great stuff, I will try this as well as Bens suggestion.

Thank you,

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Todd Tannenbaum
Sent: 20. november 2015 00:18
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] Jobs restarting

On 11/18/2015 5:28 AM, Peter Ellevseth wrote:
> Hello all
> We have some trouble with condor restarting our jobs. This happens 
> when there is some disturbance (backup job locking the disc) and the 
> head loses touch with the working nodes. I have two questions
> 1.How can I change the time it takes before the head node orders a 
> restart of a job.

If the submit machine fails to hear from the execute machine for more than X seconds, where X is defined by JobLeaseDuration in the job's submit file, then the job will be killed and restarted (potentially someplace else).

By default, X is either 20 minutes or 40 minutes (depending on the HTCondor version).

You can explicitly set it your job's submit file eg

   executable = foo.exe
   JobLeaseDuration = 3600

Or you can specify a default in the condor_config file that condor_submit will pick up and use, eg append in your condor_config

   JobLeaseDuration = 3600
   SUBMIT_EXPRS = $(SUBMIT_EXPRS) JobLeaseDuration

Some details in the Manual are at http://is.gd/ShifW8

> 2.Is it possible to change what is done when a restart is issued. 
> Could I, instead of condor sending a SIGKILL to the job, tell it to 
> run a script that shuts the job down safely?

I think Ben gave suggestions for this question in an earlier post...

> It would be preferable to have
> condor shut the job quietly down instead of restarting it.

Do you mean you don't want the job to restart?  I.e. you want to run the job once, and if there is a problem, have the job leave the queue instead of restarting?  If so, see the HOWTO at https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToAvoidJobRestarts

Hope the above helps

HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: