[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Preemption question

Steven Timm wrote:

On Fri, 24 Mar 2006, Dan Bradley wrote:

There are a number of different possible causes of preemption in Condor,
and your policy eliminates most but not all of them.  The startd RANK
expression is treated as an overriding directive by the negotiator,
trumping the normal user-priority based calculations (and therefore
PREEMPTION_REQUIREMENTS).  This means that your policy will cause
precisely the kind of preemption that you have observed--members of
group "numi" will preempt other users.

All the various tutorials I've been to and manuals I have read
didn't tell me that.  Interesting.  I thought as long as we
had PREEMPTION_REQUIREMENTS false we wouldn't preempt.

Thanks for pointing that out. I have now fixing several places in the manual where this misleading impression is made. The one place that told the truth is here:


"Note that PREEMPTION_REQUIREMENTS only applies to preemptions due to user priority. It does not have any effect if the machine rank expression prefers a different job, or if the startd policy expression causes the job to vacate due to other activity on the machine."

The effect we want to have is the following:

these 15 machines are owned by group_numi.
If the queue is full and all machines are claimed, and there
are jobs waiting from both group_numi and from others, then
on these 15 machines we want the job from group_numi to start,
independent of what user priority group_numi may have at the time.

This could be achieved with a policy such as the following:

RANK = (agroup == "group_numi" ) * 1000
#allow preempted jobs a total of 4 days wall time
MaxJobRetirementTime = 3600 * 24 * 4

However, there is one additional consideration.  The above policy doesn't say that group_numi jobs will preferentially run on group_numi machines.  It just says that they have high priority to do so.  Therefore, if there are available machines in both group_numi and elsewhere, a group_numi job could land on either one with no preference either way.  This may or may not be what you want.  To preferentially steer group_numi jobs to group_numi machines, you can do something like the following:

MachineGroup = "group_numi"
NEGOTIATOR_PRE_JOB_RANK = (agroup =?= MachineGroup)*1 + (RemoteOwner =?= UNDEFINED)*2

That says to preferentially run jobs on idle machines and secondarily to prefer machines belonging to the same group.

We would really rather not
have pre-emption happen at all, even if the cost is some idle
time on the cluster every once in a while.

By this, I assume you mean that you don't want _job_ preemption. In some cases you still appear to want _claim_ preemption. If that is the case, then setting MaxJobRetirementTime to a very large number is a good solution.

I had no idea up until now that a user through the schedd could keep a claim on a machine between the finishing of a job and the start of a new one. Where is there more information in the condor docs that
describes this situation? We may have to rethink our whole
strategy on how we do our batch system here.

In V6.6, there was no good solution for this problem (because MaxJobRetirementTime did not exist). Therefore, it is documented in the V6.6. manual in the section on disabling preemption:


However, in V6.7, this bit of knowledge does not appear in the manual, because the section on disabling preemption offers a solution that avoids the problem. Clearly, this section should still discuss some of the problems with alternate preemption -avoiding policies, because they are not obvious.