[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Cancel peaceful shutdown / cgroupsv2
- Date: Wed, 24 Jul 2019 21:57:05 +0200 (CEST)
- From: "Beyer, Christoph" <christoph.beyer@xxxxxxx>
- Subject: Re: [HTCondor-users] Cancel peaceful shutdown / cgroupsv2
good to hear from you :)
If your jobs have an estimated or forced runtime you can alter the start expression on the workernodes to take a shutdown time into account before starting the job and make it remote administrable:
On the worker:
InStageDrain = False
ShutdownTime = 0
Drain = ((InStageDrain =?= True && (time() + MaxJobRetirementTime < ShutdownTime)) || InStageDrain =?= False)
STARTD_ATTRS = InStageDrain, ShutdownTime, StartJobs, $(STARTD_ATTRS)
STARTD.SETTABLE_ATTRS_ADMINISTRATOR = StartJobs, InStageDrain, ShutdownTime
START = (your other start options) && $(Drain)
get the time of shutdown of the node in unix time, for ex->
zitpcx35701% date -d "May 30 14:59:48 CEST 2019" +%s
condor_config_val -name <workernode> -startd -set "ShutdownTime = 1559221188"
condor_config_val -name <workernode> -startd -set "InStageDrain = True"
condor_reconfig <workernode> -daemon startd
Hence this node will only start jobs that will fit into the remaining time frame ...
Building 02b, Room 009
----- UrsprÃngliche Mail -----
Von: "Carsten Aulbert" <carsten.aulbert@xxxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Mittwoch, 24. Juli 2019 21:35:05
Betreff: [HTCondor-users] Cancel peaceful shutdown / cgroupsv2
two quick questions which possibly don't warrant a full email on their own:
(1) After running condor_off -peaceful -daemon startd on a node, is it
possible to cancel this?
Background: We have to shuffle servers around and want to be nice
admins. Therefore, we stop the nodes peaceful 12+ hours in advance
before powering them off - and killing jobs which are still running then.
However, sometimes plans change and we have nodes where say 10 cores are
idle due to the shutdown but 2 cores are busy with jobs running for
another day or so. Then we have the choice to let these job finish and
"waste" the idle cores or kill those for the greater benefit. Or is
there a third option?
(2) I have not found it in the docs and I will only be able to start
testing this with condor 8.8 on Debian Buster after your next point
release, but do you already support cgroups v2? Our preliminary testing
has shown it to be quite powerful in terms of preventing jobs going into
swap while allowing processes outside of condor to use swap if needed.
Cheers and thanks a lot in advance
Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics,
CallinstraÃe 38, 30167 Hannover, Germany
Phone: +49 511 762 17185
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at: