[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Preventing excuting jobs from terminating when a schedd machine reboots



We occasionally get forced patches pushed to our Windows dektops by our
IS department and because of this we are subjected to forced reboots of
our machines. This can be a real pain if the forced reboot happens to
coincide with a time when you're trying to get a long running vanilla
job through the system. If I have a job that's been run for 2 days and
needs another day, losing it to a forced reboot can be really
frustrating.

I've added:

job_lease_duration = 720

to my submission ticket. Submitted a series of jobs. Waited for a few
jobs to begin executing. And then rebooted the windows machine from
which they were scheduled. All the jobs were immediately vacated.

How can I stop this from happening? My only guess is that there's a
shutdown routine in the schedd daemon that gets called when the service
is terminated that's actually vacating the jobs on the startd's. Is this
correct?

- Ian

--
Ian R. Chesal <ichesal@xxxxxxxxxx>
Senior Software Engineer

Altera Corporation
Toronto Technology Center
Tel: (416) 926-8300