[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_defrag only some machines?



On Thu, Jan 12, 2017 at 3:16 PM, Michael Pelletier
<Michael.V.Pelletier@xxxxxxxxxxxx> wrote:
> The ExpectedMachineGracefulDrainingBadput is an estimate of how much work will be lost when the jobs running on that machine are evicted. When a machine is drained, the jobs are instructed to gracefully evict, which means they are sent a TERM signal (by default) and allowed up to the MaxJobRetirementTime (default of zero) to shut down before being kill -9'd.
>
> A machine with 10 jobs which have accumulated 30 minutes each, if evicted, will have a minimum of 300 minutes of badput, while a machine with 1 job with 60 minutes of runtime will have 60 minutes of badput if evicted, so it will be chosen for draining ahead of the first machine.

thanks, that makes sense.  what i was missing was the correlation that
the lower the number the more likely a node is to get drained.  this
probably explains why i'm getting sets of nodes constantly
re-draining.  we have a very diverse set of jobs in the pool right
now, so some of them must just be churning every 15-20 mins while
others are taking hours

> Have you taken a look at pslot preemption? I wonder if that might be more useful for your situation than defragmenting. It seems like that might give you more control over when a whole-machine job can evict the single-core jobs, and avoid any draining at all if there are no whole-machine jobs waiting to run.

i've not.  we turned off all preemption sometime ago because it was
wreaking havoc with our users.  condor was fine, but the users were
getting very displeased to see their jobs preempted like 2 mins before
the job was to finish.  i'm sure there's probably some tweaking and
user training that might correct this, but i'm not sure i can stomach
that again

> Also, make sure that you're doing a depth-first fill of the machines for the single-core jobs, which may give the whole-machine jobs a better fighting chance; and make sure your job_lease_duration is set to something reasonable - the default is 40 minutes, but I usually use 20 (it depends on the characteristics of your jobs).

we're currently filling slots over machines, so slot1 on all machines,
then slot2 on all machines, then slot3 on all machines, by way of a
custom RANK setting.  we did this to level out the power/cooling
consumption in the data center across all the machines (which works
nicely), but i'm guessing this is probably not ideal given my current
situation