Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Drained 0 machines

Date: Fri, 17 Jul 2020 21:56:43 +0200
From: Stefano Dal Pra <stefano.dalpra@xxxxxxxxxxxx>
Subject: Re: [HTCondor-users] Drained 0 machines



Il 17/07/20 21:27, Beyer, Christoph ha scritto:

Hi,

look for DEFRAG_REQUIREMENTS & DEFRAG_WHOLE_MACHINE_EXPR

I did.

The DEFRAG_REQUIREMENTS expression match ~ 400 nodes

DEFRAG_WHOLE_MACHINE_EXPR matches 4 nodes (2 of which not eligible for draining)

I increased DefragLog verbosity and now i see a reason:

[...]
07/17/20 21:22:14 Skipping slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: because it is already running as a whole machine.
07/17/20 21:22:14 Skipping slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: because it is already running as a whole machine.
[...]

And i think this is because i had
DEFRAG_WHOLE_MACHINE_EXPR = ((Cpus == TotalCpus) || (Cpus >= 8)) && (StartJobs =?= True)

and all the skipped machines already have a 8-core job running.
I changed DEFRAG_WHOLE_MACHINE_EXPR to
((Cpus == TotalCpus) || (Cpus >= 16)) && (StartJobs =?= True)

and now i see more machines are put on draining.

Thanks,
Stefano


These knobs define the requirements which machines can be drained and what is considered a drained machine

for ex:

# machine should be partiionable and online
DEFRAG_REQUIREMENTS = PartitionableSlot && Offline=!=True
# drain down to a blob of 12 online cores
DEFRAG_WHOLE_MACHINE_EXPR = Cpus == 12 && Offline=!=True

Best
christoph

References:
- [HTCondor-users] Drained 0 machines
  - From: Stefano Dal Pra
- Re: [HTCondor-users] Drained 0 machines
  - From: Beyer, Christoph

Prev by Date: Re: [HTCondor-users] Drained 0 machines
Next by Date: [HTCondor-users] Free online HTCondor Workshop in Sept 2020 - registration is open now!
Previous by thread: Re: [HTCondor-users] Drained 0 machines
Next by thread: [HTCondor-users] Free online HTCondor Workshop in Sept 2020 - registration is open now!
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Drained 0 machines