[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Is ExpectedMachineGracefulDrainingBadput the sum of subslots and related defrag questions



# this ought to work even if ExpectedRuntimeHours were undefined, right?
START = $(START) && (KillableJob =?= true || ExpectedRuntimeHours <= 6)

	Seems unlikely:

$ classad_eval 'ExpectedRunTimeHours <= 6'
[  ]
undefined
$ classad_eval 'ExpectedRunTimeHours = 5' 'ExpectedRunTimeHours <= 6'
[ ExpectedRunTimeHours = 5 ]
true
$ classad_eval 'START=true' 'START && (KillableJOb =?= true || ExpectedRunTimeHours <= 6)'
[ START = true ]
undefined
$ classad_eval 'ExpectedRuntimeHours = 5' 'START=true' 'START && (KillableJOb =?= true || ExpectedRunTimeHours <= 6)'
[ START = true; ExpectedRuntimeHours = 5 ]
true

Also, while I expect DEFRAG_RANK to mostly steer condor_defrag to the machines with lower MaxJobVacateTime should we worry about DEFRAG_MAX_CONCURRENT_DRAINING = 10 if we have many more than 10 of the second kind of machines defined? If so, any idea which handle to use to ensure a good turn-around time?

DEFRAG_MAX_CONCURRENT_DRAINING is just a throttle, and what you want to set it to is as much a matter of your job mix as your hardware configuration. To absolutely minimize turn-around time of the "big" jobs, of course, you'd just not run "small" jobs on the big-job machines. Otherwise, it seems like setting the throttles to allow the defrag daemon to start draining all of your second type of machines would result in the shortest turn-around time. It's just not as efficient.

Have we?

Looks generally sane to me, although I can't speak to the question about if the badput numbers are summer across d-slots.

Depending on how much need there is for these very large slots, you may also want to discourage them from matching smaller jobs -- you spent quite a bit of effor draining them. One trick I've heard is to adjust the START expression for the the designated big-job slots to avoid matching small jobs for some amount of time after a defrag. (HTCondor matches jobs based on user priority, so this allows that startd to wait until the high-priority but small jobs have all been started elsewhere.)

- ToddM