[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Troubleshooting job eviction by machine RANK



Hi Todd,

On 02/04/2016 03:46 PM, Todd Tannenbaum wrote:

Perhaps it is something as simple as you put your startd rank expression
into your condor_config file(s) but failed to do a condor_reconfig?  The
startd Rank expression should appear in the slot classads; you can check
by doing something like
   condor_status -af:r name rank

Thanks for your thoughts. Everything is reporting rank back correctly though. I generate the machine-specific part of the config via a periodic script (which sets things based on various configuration management values) and executes condor_reconfig if the generated file was changed.

Maybe I should post the complete generated config rather than the previous highlights, there could be something stupid set which I didn't realize was pertinent...

As for the issue with startd rank and partitionable slots: your
impression above is correct, the only issue is with fragmentation.  Your
partitionable slot ("slot1@machine") always represents the unclaimed
resources; when a job is matched to that machine, a "dynamic" slot is
created (slot1_1@machine, slot1_2@machine, etc) that typically contains
just enough CPU and Memory to handle the matched job (although there are
config knobs available to the admin to round-up the resources).  These
dynamic slots will honor your startd Rank expression.  So for instance, if
   Rank = CondorGroup =?= "nova"
then a nova job will preempt a non-nova job running on a dynamic slot,
but ONLY IF the nova job "fits" in the dynamic slot.  So imagine you
have an infinite number of non-nova jobs that all have request_cpus=1
and then you submit a nova job with request_cpus=4.  The nova job could
starve forever because even though the nova job will take over any
dynamic slot running a non-nova job, there may not ever be any dynamic
slots in the pool with 4 cpus allocated and thus the nova job will not
match any slots.

I don't think we should be running into any problems with this - I'm pretty sure all jobs concerned are simple single-cpu ones. To be strictly accurate, I'd bet that most jobs don't specify any cpu count (but that must default to 1).

If the above issue were to occur though (using the nova example) - would the nova job successfully evict the non-nova job, even though it would then fail to run itself?

There are two solutions to this problem.  One solution is use knobs like
MODIFY_REQUEST_EXPR_REQUEST(CPUS|DISK|MEMORY) in your condor_config to
always round up the size of allocated cpus and memory to something
usable by nova jobs, i.e. don't allow non-nova jobs to fragment your
machines into slots so small that nova jobs won't match.

A second solution exists if you have HTCondor v8.4 running on your
schedd, startds, and central manager.  With HTCondor v8.4, you can say
   ALLOW_PSLOT_PREEMPTION = True
   PREEMPTION_REQUIREMENTS = True
on your central manager.  This will avoid the fragmentation issue above
as it relates to startd rank, as HTCondor will now preempt multiple
dynamic slots as required in order to fit the higher ranked job.  E.g.
HTCondor will preempt four 1-core non-nova jobs from a machine in order
to fit your 4-core nova job as preferred by your startd RANK.

I will keep a note of this though for when we are all updated to 8.4, thanks!

Graham
--
-------------------------------------------------------------------------
Graham Allan - allan@xxxxxxxxxxxxxxx - gta@xxxxxxx - (612) 624-5040
School of Physics and Astronomy - University of Minnesota
-------------------------------------------------------------------------