[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Help with scheduling start/evict policy in condor_config





Ian Stokes-Rees wrote:
On 4/14/10 11:48 AM, Dan Bradley wrote:
One likely source of trouble in this policy is that RANK is inherently
a preemptive mechanism.  RANK is only relevant when deciding whether
to preempt an existing job with a new better-ranked one.  This can
lead to rapid cycles of preemption in some cases.

I can reform the question as follows: What needs to be done to make sure
that in each matching cycle idle job slots are considered first?

If I read your ticket correctly, the policy already should make sure that jobs are sent to idle slots if they are available:

NEGOTIATOR_PRE_JOB_RANK = (RemoteOwner =?= UNDEFINED) * SlotID

So if it can be confirmed that there is an idle slot that matches the job but the negotiator is matching the job to some other slot that is claimed, then we'll need to examine that closely and understand why the pre job rank expression is not having the expected effect. If the negotiator had a stale view of the machine state (so it doesn't realize that a machine is claimed), that could lead to this sort of behavior. However, I see nothing in the configuration that would lead to that. Perhaps we'll need to look at the negotiator log to see what is going on.

What we think we see now is that matching is done
against an arbitrary machine, whether it is idle or not, and the RANK
expression means that no consideration is given to a running job, even
when other idle nodes are available.

The machine RANK expression specifies which job the machine prefers. If the job matches to multiple machines, including some idle and some claimed, and NEGOTIATOR_PRE_JOB_RANK prefers to run the job on the idle machines, then the machine RANK expression should not matter.

--Dan