[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Negotiator, matchmaking and partitionable/dynamic slots...



Hi all,

I've been using HTCondor for the last three months, in an effortÂÂto replace our current resource management scheme.
Everything has been working very well so far, but I have some doubts about negotiator matchmaking algorithm, when
Partitionable slots come into the game - that's our current strategy, using only Partitionable slots (any advice
for/against it is welcome).

I noticed, very early, we were vulnerable to starvation of bigger jobs, once we had no more resources left on our pool.
I had this user, bigger priority than the already running ones, but with bigger jobs, that never got them to run because
smaller jobs got resources first. Made some adjustments to PREEMPTION_REQUIREMENTS and PREEMPTION_RANK clauses, and kind
of solved the issue, but I'm not confident that I've understood the process entirely. If anyone can help me get it
right, I'd appreciate.

Currently, we're using newest stable release - 8.6.3 - in all machines, executing and submitting nodes, as for
negotiator and collector. ALLOW_PSLOT_PREEMPTION clause is set to true, to allow negotiator to make room for bigger jobs
from smaller ones. Below, I got an exerpt from HTCondor current manual, section 3.6, where matchmaking process is
explained. Will insert comments on the points where I'm unsure.


For simplicity, I'll consider an example scenario where we have two 12 core machines, partitionable slots, and four six
core jobs running. An user with higher priority submits a twelve core job then. No RANK experessions have been
configurated (default values), so that we're not supposed to have preemption due to it.


Source: https://research.cs.wisc.edu/htcondor/manual/v8.6/3_6User_Priorities.html


During a negotiation cycle, the condor_negotiator daemon accomplishes the following ordered list of items.

1. Build a list of all possible resources, regardless of the state of those resources.
 Â--> I assume we're talking about both Partitionable and Dynamic slots here, right?

2. Obtain a list of all job submitters (for the entire pool).

3. Sort the list of all job submitters based on EUP (see section 3.6.2 for an explanation of EUP). The submitter with Â
 Âthe best priority is first within the sorted list.

4. Iterate until there are either no more resources to match, or no more jobs to match.

ÂÂÂÂFor each submitter (in EUP order):

ÂÂÂÂÂÂÂÂFor each submitter, get each job. Since jobs may be submitted from more than one machine (hence to more than one
    condor_schedd daemon), here is a further definition of the ordering of these jobs. With jobs from a single
    condor_schedd daemon, jobs are typically returned in job priority order. When more than one condor_schedd
    daemon is involved, they are contacted in an undefined order. All jobs from a single condor_schedd daemon are
    considered before moving on to the next. For each job:

ÂÂÂÂÂÂÂÂÂÂÂÂ- For each machine in the pool that can execute jobs:ÂÂ--> again, both Partitionable and Dynamic slots?

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.1. If machine.requirements evaluates to False or job.requirements evaluates to False, skip this
          Âmachine
          Â--> given above example, on this point I believe Partitionable slots should be skipped, due to
            Âexhaustion of cores?

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.2. If the machine is in the Claimed state, but not running a job, skip this machine.

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.3. If this machine is not running a job, add it to the potential match list by reason of No
          ÂPreemption.

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.4. If the machine is running a job

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ- If the machine.RANK on this job is better than the running job, add this machine to the potential
           match list by reason of Rank.
           --> not applicable

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ- If the EUP of this job is better than the EUP of the currently running job, and
           PREEMPTION_REQUIREMENTS is True, and the machine.RANK on this job is not worse than the currently
           running job, add this machine to the potential match list by reason of Priority.
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ--> this is one of the points where I have biggest doubts. I've experimented with
              ÂPREEMPTION_REQUIREMENTS - using debug function - and it appears that the expression is
              Âevaluated over Partitionable slots, not Dynamic. Is that true? If so, should I assume all
              Âcorresponding Dynamic slots belong to users with worse priority than current job's one, or
              ÂI have to enforce this on the expression itself?

ÂÂÂÂÂÂÂÂÂÂÂÂ- Of machines in the potential match list, sort by NEGOTIATOR_PRE_JOB_RANK, job.RANK,
       NEGOTIATOR_POST_JOB_RANK, Reason for claim (No Preemption, then Rank, then Priority), PREEMPTION_RANK
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ--> On this point, it appears that PREEMPTION_RANK applies to Dynamic slots on the corresponding
          Partiotionable ones. I admit that didn't experiment on this too much, since previous point had me
          confused enough.

ÂÂÂÂÂÂÂÂÂÂÂÂ- The job is assigned to the top machine on the potential match list. The machine is removed from the list
       of resources to match (on this negotiation cycle).



Well, hope I've made myself clear enough and some of you could clarify things a little.


Thanks in advance!