[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Effect of condor_off -peaceful on schedd
- Date: Tue, 18 Apr 2006 16:27:47 -0500
- From: Dan Bradley <dan@xxxxxxxxxxxx>
- Subject: Re: [Condor-users] Effect of condor_off -peaceful on schedd
Currently, for all daemons other than the startd, condor_off -peaceful
is equivalent to condor_off -graceful. Therefore, in the case of the
schedd, this means that it will force jobs to vacate, which is not what
you want. We are planning to improve support for -peaceful
shutdown/restart of the schedd, but instead of just waiting for all
jobs to finish, we are hoping to take advantage of restartable
starter-shadow connections when job_lease_duration is being used.
Currently, if you wanted to quickly reboot the schedd and your jobs are
using job_lease_duration, you would have to kill -9 the schedd and its
shadows, and then restart the schedd, which will then start up shadows
to reconnect to the existing jobs that were running. Note, however,
that restartable shadow-starter connections do not currently work for
jobs that are running in a remote pool via condor flocking--another
item on the TODO list.
On Apr 17, 2006, at 10:58 AM, Steven Timm wrote:
We have, in the past, used condor_off -peaceful to shut down a number
of worker nodes in our cluster for maintenance and it has done what
we expected it to do, namely, keep the startd from starting any more
jobs and finish the one that is already running.
My question is--what if we did
condor_off -all -peaceful
on the head node that is, in our configuration, running schedd,
collector, and negotiator? What would be the result?
It would be nice to get the schedd in a state such that it
would let any currently-running jobs finish, and record that
they had finished, but not let any new ones start. Would
that be the effect of condor_off -peaceful on a schedd, or would
the effects be totally unpredictable?
Steven C. Timm, Ph.D (630) 840-8525 timm@xxxxxxxx
Fermilab Computing Div/Core Support Services Dept./Scientific
Assistant Group Leader, Farms and Clustered Systems Group
Lead of Computing Farms Team
Condor-users mailing list