[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Troubleshooting job eviction by machine RANK



On 2/3/2016 6:28 PM, Graham Allan wrote:


Now I have seen allusions to issues between partitionable slots and
preemption but not exactly what they are - I had an impression it was
something to do with evicted jobs leaving the slots fragmented, rather
than preemption just not happening.

Graham

Hi Graham,

Perhaps it is something as simple as you put your startd rank expression into your condor_config file(s) but failed to do a condor_reconfig? The startd Rank expression should appear in the slot classads; you can check by doing something like
  condor_status -af:r name rank

As for the issue with startd rank and partitionable slots: your impression above is correct, the only issue is with fragmentation. Your partitionable slot ("slot1@machine") always represents the unclaimed resources; when a job is matched to that machine, a "dynamic" slot is created (slot1_1@machine, slot1_2@machine, etc) that typically contains just enough CPU and Memory to handle the matched job (although there are config knobs available to the admin to round-up the resources). These dynamic slots will honor your startd Rank expression. So for instance, if
  Rank = CondorGroup =?= "nova"
then a nova job will preempt a non-nova job running on a dynamic slot, but ONLY IF the nova job "fits" in the dynamic slot. So imagine you have an infinite number of non-nova jobs that all have request_cpus=1 and then you submit a nova job with request_cpus=4. The nova job could starve forever because even though the nova job will take over any dynamic slot running a non-nova job, there may not ever be any dynamic slots in the pool with 4 cpus allocated and thus the nova job will not match any slots.

There are two solutions to this problem. One solution is use knobs like MODIFY_REQUEST_EXPR_REQUEST(CPUS|DISK|MEMORY) in your condor_config to always round up the size of allocated cpus and memory to something usable by nova jobs, i.e. don't allow non-nova jobs to fragment your machines into slots so small that nova jobs won't match.

A second solution exists if you have HTCondor v8.4 running on your schedd, startds, and central manager. With HTCondor v8.4, you can say
  ALLOW_PSLOT_PREEMPTION = True
  PREEMPTION_REQUIREMENTS = True
on your central manager. This will avoid the fragmentation issue above as it relates to startd rank, as HTCondor will now preempt multiple dynamic slots as required in order to fit the higher ranked job. E.g. HTCondor will preempt four 1-core non-nova jobs from a machine in order to fit your 4-core nova job as preferred by your startd RANK.

Hope the above helps,
Todd