[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] pool drainoff




How can I put a single node in a condor pool into a
'drainoff' state, that is, let any jobs currently running on
the node finish, but don't accept new jobs.

It should be:

	condor_off -peaceful

In theory that will shut down the machines once all the running jobs
leave. In practice I find if one job takes an incredibly long time to
run new jobs keep getting assigned to the machine and a peaceful point
to shut down is never reached. That's with 6.8.6 (yea, Condor guys, I
know: why don't I tell you about these things? Sometimes it just slips
my mind... :) ).


In practice I've found two gotchas with this approach
(1) you have to execute condor_off -peaceful individually
for each startd in the pool.   If you just do a global
condor_off -peaceful it will kill the schedd's and negotiators
well before the startd's go off and you won't have the
desired result.  (the jobs will all finish but condor
will never know about it).  They need a feature added to
automatically do the startd's first and then the schedd's and collector/negotiators.

(2) If you execute condor_off -peaceful for a lot of nodes
in rapid succession it will send the collector into a dance of death
from which it can take hours to extract itself and condor_status
will time out in the meantime.  Supposedly that will
be fixed in condor 7.0.2.

The other two features I've wanted for a long time are (1) an instruction
to tell a schedd to start all its existing jobs but not
accept any more new ones.  Also (2) an instruction to let existing
jobs on a schedd complete but not start any more new ones.  (yes
I know the latter could be accomplished with condor_hold -constraint ...)

Steve





I thought I could do this by setting 'START=False' in the
node-specific condor_config.local, followed by
'condor_reconfig -subsystem startd' on the node, but that
doesn't seem to have worked.  The node is still starting new jobs.

Hmm...try:

	condor_reconfig -startd -full

But my gut feeling that is that START = False is going to immediately
vacate the running jobs.

- Ian


Confidentiality Notice.  This message may contain information that is confidential or otherwise protected from disclosure.
If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,
or copying of this message, or any attachments, is strictly prohibited.  If you have received this message in error,
please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/