[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Preemption while there are free slots



Hello Todd,


Your guesses led us to the right answer.


The code in Matchmaker::pslotMultiMatch will add candidate dslot resources to pslot attributes which will change the MY.(Cpus|Memory|Disk) in the WithinResourceLimits _expression_, making it evaluate to TRUE even on full machines.


Our solution was to create other _expression_ based on sum of the child resources - Child(Cpus|Memory|Disk) and use it instead.


PSLOT_AVAILABLE_CPUS = ifThenElse(size(ChildCpus) =?= 0, TotalSlotCpus, (TotalSlotCpus - sum(ChildCpus))
PSLOT_AVAILABLE_MEMORY = ifThenElse(size(ChildMemory) =?= 0, TotalSlotMemory, (TotalSlotMemory - sum(ChildMemory))

OUR_WITHIN_RESOURCE_LIMIT = ( ($(PSLOT_AVAILABLE_CPUS) >= TARGET.RequestCpus) && ($(PSLOT_AVAILABLE_MEMORY) >= TARGET.RequestMemory) )

NEGOTIATOR_PRE_JOB_RANK = (10000000 + (1000000 * $(OUR_WITHIN_RESOURCE_LIMIT)) - 100000 * $(PSLOT_AVAILABLE_CPUS) - $(PSLOT_AVAILABLE_MEMORY))



Thank you very much for the help,


Best regards,

Zohar


From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Sent: Tuesday, December 22, 2020 7:24:15 PM
To: HTCondor-Users Mail List; Zohar Kol
Subject: Re: [HTCondor-users] Preemption while there are free slots
 
On 12/22/2020 6:34 AM, Zohar Kol wrote:

Hi,


CondorVersion 8.8.9

Setup includes:

  1. Accounting group quota
  2. All dynamic slots
  3. allow_pslot_preemption


We see a lot of job evictions while there are plenty of free resources.

"Job was evicted" is followed by "Job executing on host..." in the next cycle.

This happens while there are machines that can serve the requirements of incoming job without preempting.


What configuration is the relevant place to handle this situation?


We've tried NEGOTIATOR_PRE_JOB_RANK and PREEMPTION_RANK but those evaluate the job and partitionable slot classad which have no knowledge of negotiator classad like  LastNegotiationCycleMatches<X>.


Any ideas?



Hi Zohar,

I did not test any of my guesses below, so I may be heading in the wrong direction, but my initial guess is the default settings for both NEGOTIATOR_PRE_JOB_RANK and NEGOTIATOR_POST_JOB_RANK are not good defaults if you set ALLOW_PSLOT_PREEMPTION=True.

The HTCondor Manual, in the section that talks about ALLOW_PSLOT_PREEMPTION, gives this vague warning that I think may be relevant to your problem (see https://tinyurl.com/yct4tuks) : "When multiple partitionable slots match a candidate job and the various job rank expressions are evaluated to sort the matching slots, the ClassAd of the partitionable slot is used for evaluation. This may cause unexpected results for some expressions, as attributes such as RemoteOwner will not be present in a partitionable slot that matches with preemption of some of its dynamic slots."

The default value for NEGOTIATOR_(PRE|POST)_JOB_RANK on my laptop (running HTCondor v8.9.10, but my guess is it is essentially the same in v8.8.9) are:

NEGOTIATOR_PRE_JOB_RANK = (10000000 * My.Rank) + (1000000 * (RemoteOwner =?= UNDEFINED)) - (100000 * Cpus) - Memory
NEGOTIATOR_POST_JOB_RANK = (RemoteOwner =?= UNDEFINED) * (ifthenElse(isUndefined(KFlops), 1000, Kflops) - SlotID - 1.0e10*(Offline=?=True))

I am guessing the clause "(RemoteOwner =?= UNDEFINED)" is causing the problem for you, since pslots never have RemoteOwner defined.   Normally this is all fine, but by also changing ALLOW_PSLOT_PREEMPTION to True, that means that a matched pslot that does not have enough free resources will start preempting dynamic slots (dslots) until the needed resources are available.  Thus I suggest changing that clause to instead sort such that pslots which  already have enough resources free to run the job without preemption are preferred; one way to do this is replace the RemoteOwner clause to instead  be "(WithinResourceLimits =?= True)". 

So TL;DR, on your central manager, try setting the following knobs  (and then do a condor_reconfig):

NEGOTIATOR_PRE_JOB_RANK = (10000000 * My.Rank) + (1000000 * (WithinResourceLimits =?= True)) - (100000 * Cpus) - Memory
NEGOTIATOR_POST_JOB_RANK = (
WithinResourceLimits =?= True) * (ifthenElse(isUndefined(KFlops), 1000,

Again, the above is just a guess...  But if it solves your problem please let us know, and we could improve the defaults for these knobs appropriately, or at least make the vague warning in the HTCondor Manual more helpful with an example.

Hope the above helps,
regards,
Todd