Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Preemption issues

Date: Wed, 20 Oct 2010 09:48:28 -0500
From: Dan Bradley <dan@xxxxxxxxxxxx>
Subject: Re: [Condor-users] Preemption issues



On 10/20/10 9:05 AM, Jonathan D. Proulx wrote:

Hi All,

I'm looking to disable preemption on some of the systems in my cluster
running:

$ condor_version
$CondorVersion: 7.4.0 Nov  1 2009 BuildID: 193173 $
$CondorPlatform: X86_64-LINUX_DEBIAN50 $

The goal being for running jobs never to be interrupted (which I know
isn't quite the same as not preempting claims).

My first attempt using the example in the manual (3.5.9.5):

#Disable preemption by machine activity.
PREEMPT = False
#Disable preemption by user priority.
PREEMPTION_REQUIREMENTS = False
#Disable preemption by machine RANK by ranking all jobs equally.
RANK = 0

still gets jobs preempted due to user priority (checked runtime values
with condor_config_val to see the values I expect are the ones
actually in use)

The above policy should definitely not allow preemption based on userpriority. Are you setting PREEMPTION_REQUIREMENTS in the configurationof the negotiator? The rest of the settings apply to the worker node,but that setting applies to the negotiator.


Another configuration setting that you can apply to the negotiator is this:

NEGOTIATOR_CONSIDER_PREEMPTION = False

Given the above policy, this additional setting shouldn't changebehavior, but it should result in more efficient negotiation, since thework can be avoided.

My second attempt was to set a high MAXJOBRETIREMENTTIME as suggested
in the same section this "works" but queued jobs seem to get stuck to
a node that is doing this slow preemtion and are not reassinged to
other resources if the become available and since some jobs in the
cluster run for minutes and some for weeks this is not really what I'm
looking for.

This is expected behavior. The "stickiness" has a timeout, controlledby REQUEST_CLAIM_TIMEOUT, which defaults to 30 minutes.

I had thought this was working previously and has been part of an
advertized feature of our cluster for years, but I'm honestly not
certain if the behaviour has changed or if it were simply
insufficiently tested in the past.

I can't think of any changes in recent versions of Condor that wouldimpact the above policies.


--Dan

Follow-Ups:
- Re: [Condor-users] Preemption issues
  - From: Jonathan D. Proulx

References:
- [Condor-users] Preemption issues
  - From: Jonathan D. Proulx

Prev by Date: [Condor-users] Preemption issues
Next by Date: Re: [Condor-users] getenv in submit file has no effect
Previous by thread: [Condor-users] Preemption issues
Next by thread: Re: [Condor-users] Preemption issues
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Preemption issues