Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Eviction and dynamic provisioning
- Date: Wed, 22 Sep 2010 14:58:03 -0500
- From: Greg Langmead <glangmead@xxxxxxxxxxxxxxxxxx>
- Subject: [Condor-users] Eviction and dynamic provisioning
We have a pool of about 1000 cpu (Fedora Core 12) being managed with Condor 7.4.2 with dynamic provisioning. The queue is busy today and I'm not liking what I'm seeing. I have three users each with a long backlog of idle jobs:
- user1 has lots of jobs with request_memory = 8000, request_cpus = 1, and priority 112
- user2 has lots of jobs with request_memory = 6000, request_cpus = 1, and priority 5
- user3 has lots of jobs with request_memory = 4000, request_cpus = 1, and priority 32
Most machines have 32G of RAM with 16 actual cores (but one partitionable Condor slot). My eviction settings (honed over 3 years of usage in the static provisioning environment) are:
PREEMPTION_REQUIREMENTS = ( (CurrentTime - EnteredCurrentState) > (6 * (60 * 60))) && ( RemoteUserPrio > (SubmitterPrio * 1.2 ))
i.e., let jobs run for 6 hours, after which they can be evicted by a user with better priority by a factor of 1.2.
Here's what I'm observing:
- the user with request_memory = 8000 is getting jobs served every negotiation cycle
- the user with request_memory = 6000 is starved out
- the user with request_memory = 4000 is getting jobs served every negotiation cycle
Moreover the 8G jobs may run indefinitely and are never evicted. Many have run for over 8 hours.
So my question is, how is eviction done with dynamic provisioning? Is a 6G job even compared to an 8G one to see if might preempt it? Also, when a new negotiation cycle is started, how could an 8G job from a user with terrible priority get run when a 6G job from a user with better priority does not? I can understand why a 4G job might slip in when a 6G one doesn't fit, so it's the 8G versus 6G competition that is not working.
Many thanks,
Greg