[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Troubleshooting job eviction by machine RANK
- Date: Thu, 04 Feb 2016 15:46:13 -0600
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Troubleshooting job eviction by machine RANK
On 2/3/2016 6:28 PM, Graham Allan wrote:
Now I have seen allusions to issues between partitionable slots and
preemption but not exactly what they are - I had an impression it was
something to do with evicted jobs leaving the slots fragmented, rather
than preemption just not happening.
Perhaps it is something as simple as you put your startd rank expression
into your condor_config file(s) but failed to do a condor_reconfig? The
startd Rank expression should appear in the slot classads; you can check
by doing something like
condor_status -af:r name rank
As for the issue with startd rank and partitionable slots: your
impression above is correct, the only issue is with fragmentation. Your
partitionable slot ("slot1@machine") always represents the unclaimed
resources; when a job is matched to that machine, a "dynamic" slot is
created (slot1_1@machine, slot1_2@machine, etc) that typically contains
just enough CPU and Memory to handle the matched job (although there are
config knobs available to the admin to round-up the resources). These
dynamic slots will honor your startd Rank expression. So for instance, if
Rank = CondorGroup =?= "nova"
then a nova job will preempt a non-nova job running on a dynamic slot,
but ONLY IF the nova job "fits" in the dynamic slot. So imagine you
have an infinite number of non-nova jobs that all have request_cpus=1
and then you submit a nova job with request_cpus=4. The nova job could
starve forever because even though the nova job will take over any
dynamic slot running a non-nova job, there may not ever be any dynamic
slots in the pool with 4 cpus allocated and thus the nova job will not
match any slots.
There are two solutions to this problem. One solution is use knobs like
MODIFY_REQUEST_EXPR_REQUEST(CPUS|DISK|MEMORY) in your condor_config to
always round up the size of allocated cpus and memory to something
usable by nova jobs, i.e. don't allow non-nova jobs to fragment your
machines into slots so small that nova jobs won't match.
A second solution exists if you have HTCondor v8.4 running on your
schedd, startds, and central manager. With HTCondor v8.4, you can say
ALLOW_PSLOT_PREEMPTION = True
PREEMPTION_REQUIREMENTS = True
on your central manager. This will avoid the fragmentation issue above
as it relates to startd rank, as HTCondor will now preempt multiple
dynamic slots as required in order to fit the higher ranked job. E.g.
HTCondor will preempt four 1-core non-nova jobs from a machine in order
to fit your 4-core nova job as preferred by your startd RANK.
Hope the above helps,