Re: [HTCondor-users] Drain HTCondor worker by setting instance metadata value

On 9/5/2017 1:49 PM, Dimitri Maziuk wrote:
On 09/05/2017 01:28 PM, Todd Tannenbaum wrote:
On 9/5/2017 1:19 PM, Dimitri Maziuk wrote:
On 09/05/2017 11:28 AM, Todd Tannenbaum wrote:

     condor_drain <machine-name>

Quick question: will it reset if I bounce the node or will I need to run
condor_drain -cancel after reboot?

It will reset after rebooting the node.  No need for -cancel.

Thank you, but condor_drain -graceful has just SIGTERM'ed the running
jobs which is not quite the same as setting START to false and running

Good point.

If you don't want HTCondor to preempt (i.e. SIGTERM) a running job unless the job has already run for over X seconds, set MaxJobRetirementTime to X in condor_config on the execute node. condor_drain -graceful will honor the MaxJobRetirementTime attribute, as will preemption for any other reason i.e. user priority, startd rank expression, preempt expression, etc. See


If you want to configure things so preemption of a job is only delayed
in the case of draining, but you still want the job to be immediately
preempted in the case of user priority/rank/preempt expression, etc,
note that MaxJobRetirementTime is a classad expression evaluated in the
context of the slot ad. So if you put in the condor_config on your execute machine something like:

   MaxJobRetirementTime = ifThenElse(Draining =?= True, 8*60*60, 0)

it will tell HTCondor on that execute node to allow jobs to continue running unmolested for up to eight hours when they receive a condor_drain command. The key here is the condor_startd helpfully sets the slot attribute
Draining=True whenever it is in draining state.

Hope the above helps,

So it appears "instead of twiddling with START expressions ... simply
invoke the condor_drain" is not entirely correct.

