[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to set a worker node offline in HTCondor



Hi,

Yeah, names are important. In that regard, a better name for this command is condor_vacate â if you donât tell it anything else (no maxjobretirementtime) then it causes all jobs to vacate the node. âDrainâ to me takes time - a full bathtub doesnât have all the water disappear at once :) Drain is also passive - you pull the plug and the water slowly flows out of the tub.

Btw it was Torque, not SLURM.

JT

On 31 Mar 2021, at 22:32, Todd Tannenbaum wrote:

On 3/31/2021 2:09 PM, gthain@xxxxxxxxxxx wrote:
On 3/31/21 1:55 PM, templon@xxxxxxxxx wrote:
 

What is the corresponding simplest way to achieve exactly this in HTCondor?

Note the word âexactlyâ :)

The answer was the condor_drain command, but it does not achieve exactly this, without a bit more. condor_drain also evicts running jobs from slots, depending on what the value of MaxJobRetirementTime is. I did not know about this variable so we did not have it set, and aside from nodes not accepting new jobs (the question), they stopped running the already-running jobs - not the desired behavior.


Yes, apologies for this Jeff!  I had forgotten our pool sets MaxJobRetirementTime. Indeed, as you discovered, I suggest you set MaxJobRetirementTime as documented, i.e. set it in your config to be how long a job should be able to run without being interrupted by HTCondor ... note this is a classad _expression_ that can reference attributes in the job itself if you desire.  Alternative, using condor_off -peaceful as Greg suggested is another option.

While you can configure all kinds of time-based policies (e.g. maximum run time until killed, maximum run time until candidate for preemption, etc) today using the flexibility and ability to insert customized attribute offered by HTCondor's ClassAds, we plan to look at how to make these sort of policies more "first-class".  Doing so would allow perhaps simplify their use, and at the very least standardize how users specify these time limits.

regards
Todd

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/