[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Eviction and dynamic provisioning



Preemption works as always, on the level of a single slot. So a small job could preempt a large job and a job could preempt one of the same size. You won't get jobs preempting jobs that are smaller.

If you have issues between your 6G and 8G jobs, you may want to look at the set of slots they match. Possibly 6G jobs are not matching with the resources available for both 6G and 8G jobs.

Best,


matt

On 10/04/2010 11:54 AM, Greg Langmead wrote:
Hi. I'm reposting this message from a couple weeks ago to see if anyone can
help me understand how eviction calculations are done in the presence of
dynamic provisioning. I continue to see jobs meet my preemption requirements
not being evicted, and so users are getting starved. Since slots are
dynamic, it's not even clear that eviction means anything, since slots get
destroyed and recreated, is that right? So I'm guessing there's another
paradigm to balance users out.

Thanks.

Greg Langmead | Senior Research Scientist | SDL Language Weaver | (t) +1 310
437 7300

On 9/22/10 3:58 PM, "Greg Langmead"<glangmead@xxxxxxxxxxxxxxxxxx>  wrote:

We have a pool of about 1000 cpu (Fedora Core 12) being managed with Condor
7.4.2 with dynamic provisioning. The queue is busy today and I'm not liking
what I'm seeing. I have three users each with a long backlog of idle jobs:

- user1 has lots of jobs with request_memory = 8000, request_cpus = 1, and
priority 112
- user2 has lots of jobs with request_memory = 6000, request_cpus = 1, and
priority 5
- user3 has lots of jobs with request_memory = 4000, request_cpus = 1, and
priority 32

Most machines have 32G of RAM with 16 actual cores (but one partitionable
Condor slot). My eviction settings (honed over 3 years of usage in the static
provisioning environment) are:

PREEMPTION_REQUIREMENTS = ( (CurrentTime - EnteredCurrentState)>  (6 * (60 *
60)))&&  ( RemoteUserPrio>  (SubmitterPrio * 1.2 ))

i.e., let jobs run for 6 hours, after which they can be evicted by a user with
better priority by a factor of 1.2.

Here's what I'm observing:

- the user with request_memory = 8000 is getting jobs served every negotiation
cycle
- the user with request_memory = 6000 is starved out
- the user with request_memory = 4000 is getting jobs served every negotiation
cycle

Moreover the 8G jobs may run indefinitely and are never evicted. Many have run
for over 8 hours.

So my question is, how is eviction done with dynamic provisioning? Is a 6G job
even compared to an 8G one to see if might preempt it? Also, when a new
negotiation cycle is started, how could an 8G job from a user with terrible
priority get run when a 6G job from a user with better priority does not? I
can understand why a 4G job might slip in when a 6G one doesn't fit, so it's
the 8G versus 6G competition that is not working.

Many thanks,
Greg