[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Eviction and dynamic provisioning



We have a pool of about 1000 cpu (Fedora Core 12) being managed with Condor 7.4.2 with dynamic provisioning. The queue is busy today and I'm not liking what I'm seeing. I have three users each with a long backlog of idle jobs:

- user1 has lots of jobs with request_memory = 8000, request_cpus = 1, and priority 112
- user2 has lots of jobs with request_memory = 6000, request_cpus = 1, and priority 5
- user3 has lots of jobs with request_memory = 4000, request_cpus = 1, and priority 32

Most machines have 32G of RAM with 16 actual cores (but one partitionable Condor slot). My eviction settings (honed over 3 years of usage in the static provisioning environment) are:

PREEMPTION_REQUIREMENTS = ( (CurrentTime - EnteredCurrentState) > (6 * (60 * 60))) && ( RemoteUserPrio > (SubmitterPrio * 1.2 ))

i.e., let jobs run for 6 hours, after which they can be evicted by a user with better priority by a factor of 1.2.

Here's what I'm observing:

- the user with request_memory = 8000 is getting jobs served every negotiation cycle
- the user with request_memory = 6000 is starved out
- the user with request_memory = 4000 is getting jobs served every negotiation cycle

Moreover the 8G jobs may run indefinitely and are never evicted. Many have run for over 8 hours.

So my question is, how is eviction done with dynamic provisioning? Is a 6G job even compared to an 8G one to see if might preempt it? Also, when a new negotiation cycle is started, how could an 8G job from a user with terrible priority get run when a 6G job from a user with better priority does not? I can understand why a 4G job might slip in when a 6G one doesn't fit, so it's the 8G versus 6G competition that is not working.

Many thanks,
Greg