Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Is ExpectedMachineGracefulDrainingBadput the sum of subslots and related defrag questions

Date: Wed, 15 Sep 2021 01:31:20 -0500 (CDT)
From: Todd L Miller <tlmiller@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Is ExpectedMachineGracefulDrainingBadput the sum of subslots and related defrag questions

# this ought to work even if ExpectedRuntimeHours were undefined, right?
START = $(START) && (KillableJob =?= true || ExpectedRuntimeHours <= 6)


	Seems unlikely:

$ classad_eval 'ExpectedRunTimeHours <= 6'
[  ]
undefined
$ classad_eval 'ExpectedRunTimeHours = 5' 'ExpectedRunTimeHours <= 6'
[ ExpectedRunTimeHours = 5 ]
true
$ classad_eval 'START=true' 'START && (KillableJOb =?= true || ExpectedRunTimeHours <= 6)'
[ START = true ]
undefined
$ classad_eval 'ExpectedRuntimeHours = 5' 'START=true' 'START && (KillableJOb =?= true || ExpectedRunTimeHours <= 6)'
[ START = true; ExpectedRuntimeHours = 5 ]
true

Also, while I expect DEFRAG_RANK to mostly steer condor_defrag to themachines with lower MaxJobVacateTime should we worry aboutDEFRAG_MAX_CONCURRENT_DRAINING = 10 if we have many more than 10 of thesecond kind of machines defined? If so, any idea which handle to use toensure a good turn-around time?

DEFRAG_MAX_CONCURRENT_DRAINING is just a throttle, and what youwant to set it to is as much a matter of your job mix as your hardwareconfiguration. To absolutely minimize turn-around time of the "big" jobs,of course, you'd just not run "small" jobs on the big-job machines.Otherwise, it seems like setting the throttles to allow the defrag daemonto start draining all of your second type of machines would result in theshortest turn-around time. It's just not as efficient.

Have we?

Looks generally sane to me, although I can't speak to the questionabout if the badput numbers are summer across d-slots.

Depending on how much need there is for these very large slots,you may also want to discourage them from matching smaller jobs -- youspent quite a bit of effor draining them. One trick I've heard is toadjust the START expression for the the designated big-job slots to avoidmatching small jobs for some amount of time after a defrag. (HTCondormatches jobs based on user priority, so this allows that startd to waituntil the high-priority but small jobs have all been started elsewhere.)


- ToddM

References:
- [HTCondor-users] Is ExpectedMachineGracefulDrainingBadput the sum of subslots and related defrag questions
  - From: Carsten Aulbert

Prev by Date: Re: [HTCondor-users] EncryptExecuteDirectory issues on Windows execute nodes without run_as_owner
Next by Date: [HTCondor-users] how to upgrade producing less damage as possible
Previous by thread: [HTCondor-users] Is ExpectedMachineGracefulDrainingBadput the sum of subslots and related defrag questions
Next by thread: [HTCondor-users] how to upgrade producing less damage as possible
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Is ExpectedMachineGracefulDrainingBadput the sum of subslots and related defrag questions