[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Drain HTCondor worker by setting instance metadata value



On 9/5/2017 1:49 PM, Dimitri Maziuk wrote:
On 09/05/2017 01:28 PM, Todd Tannenbaum wrote:
On 9/5/2017 1:19 PM, Dimitri Maziuk wrote:
On 09/05/2017 11:28 AM, Todd Tannenbaum wrote:

     condor_drain <machine-name>

Quick question: will it reset if I bounce the node or will I need to run
condor_drain -cancel after reboot?


It will reset after rebooting the node.  No need for -cancel.

Thank you, but condor_drain -graceful has just SIGTERM'ed the running
jobs which is not quite the same as setting START to false and running
condor_reconig.


Good point.

If you don't want HTCondor to preempt (i.e. SIGTERM) a running job unless the job has already run for over X seconds, set MaxJobRetirementTime to X in condor_config on the execute node. condor_drain -graceful will honor the MaxJobRetirementTime attribute, as will preemption for any other reason i.e. user priority, startd rank expression, preempt expression, etc. See

http://research.cs.wisc.edu/htcondor/manual/v8.6/3_5Configuration_Macros.html#25630

If you want to configure things so preemption of a job is only delayed
in the case of draining, but you still want the job to be immediately
preempted in the case of user priority/rank/preempt expression, etc,
note that MaxJobRetirementTime is a classad expression evaluated in the
context of the slot ad. So if you put in the condor_config on your execute machine something like:

   MaxJobRetirementTime = ifThenElse(Draining =?= True, 8*60*60, 0)

it will tell HTCondor on that execute node to allow jobs to continue running unmolested for up to eight hours when they receive a condor_drain command. The key here is the condor_startd helpfully sets the slot attribute
Draining=True whenever it is in draining state.

Hope the above helps,
Todd




So it appears "instead of twiddling with START expressions ... simply
invoke the condor_drain" is not entirely correct.



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685