[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Help with scheduling start/evict policy in condor_config
- Date: Wed, 14 Apr 2010 14:13:10 -0500
- From: Dan Bradley <dan@xxxxxxxxxxxx>
- Subject: Re: [Condor-users] Help with scheduling start/evict policy in condor_config
Ian Stokes-Rees wrote:
On 4/14/10 11:48 AM, Dan Bradley wrote:
One likely source of trouble in this policy is that RANK is inherently
a preemptive mechanism. RANK is only relevant when deciding whether
to preempt an existing job with a new better-ranked one. This can
lead to rapid cycles of preemption in some cases.
I can reform the question as follows: What needs to be done to make sure
that in each matching cycle idle job slots are considered first?
If I read your ticket correctly, the policy already should make sure
that jobs are sent to idle slots if they are available:
NEGOTIATOR_PRE_JOB_RANK = (RemoteOwner =?= UNDEFINED) * SlotID
So if it can be confirmed that there is an idle slot that matches the
job but the negotiator is matching the job to some other slot that is
claimed, then we'll need to examine that closely and understand why the
pre job rank expression is not having the expected effect. If the
negotiator had a stale view of the machine state (so it doesn't realize
that a machine is claimed), that could lead to this sort of behavior.
However, I see nothing in the configuration that would lead to that.
Perhaps we'll need to look at the negotiator log to see what is going on.
What we think we see now is that matching is done
against an arbitrary machine, whether it is idle or not, and the RANK
expression means that no consideration is given to a running job, even
when other idle nodes are available.
The machine RANK expression specifies which job the machine prefers. If
the job matches to multiple machines, including some idle and some
claimed, and NEGOTIATOR_PRE_JOB_RANK prefers to run the job on the idle
machines, then the machine RANK expression should not matter.