[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Can we preferably defrag the largest machines?



Hi all,

as our pool became more and more heterogeneous over time, I am currently
running into a brick wall how to persuade condor to preferably defrag
the largest machines.

The current situation:

User A wants to have many CPU cores (say 96) and we have a number of
machines with 128 cores. Unfortunately for him, we currently have a
bunch of idle 4, 6 and 32 core machines in the same pool and thus
condor_defrag does not want to defrag the large machines as it is
already beyond its number of machines to be defragged.

I tried to steer it somewhat with DEFRAG_REQUIREMENTS but so far to zip
effect:

07/21/20 10:55:15        a8005.atlas.local
07/21/20 10:55:15 Newly departed draining machines is
07/21/20 10:55:15 (no machines)
07/21/20 10:55:15 Arrival rate is 2.9975 machines/hour
07/21/20 10:55:15 Lifetime whole machines arrived: 2
07/21/20 10:55:15 Lifetime mean arrival rate: 2.19111 machines / hour
07/21/20 10:55:15 Lifetime mean arrival rate sd: 0.570203
07/21/20 10:55:15 Average pool draining badput = 454.53%
07/21/20 10:55:15 Average pool draining unclaimed = 1822.92%
07/21/20 10:55:15 Doing nothing, because number to drain in next 300s is
calculated to be 0.

(newly arrived machine were machine just coming back after repairs)

current configuration:

condor_config_val -sum | grep DEFRAG
# from /etc/condor/config.d/20_DEFRAG
DAEMON_LIST = MASTER COLLECTOR NEGOTIATOR DEFRAG
DEFRAG_INTERVAL = 300
DEFRAG_DRAINING_MACHINES_PER_HOUR = 5.0
DEFRAG_MAX_WHOLE_MACHINES = 20
DEFRAG_MAX_CONCURRENT_DRAINING = 10
DEFRAG_REQUIREMENTS = PartitionableSlot && Offline=!=True && TotalCpus >= 64
DEFRAG_DRAINING_START_EXPR = (KillableJob =?= true)

Any idea if this is achievable with the current set of knobs and what I
am missing?

Cheers

Carsten

PS: I have not yet played with DEFRAG_RANK as I guess that will only
come into play, if we were in a situation where defrag would want to do
something.

-- 
Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics,
CallinstraÃe 38, 30167 Hannover, Germany
Phone: +49 511 762 17185

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature