[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Eviction and dynamic provisioning



Hi. I'm reposting this message from a couple weeks ago to see if anyone can
help me understand how eviction calculations are done in the presence of
dynamic provisioning. I continue to see jobs meet my preemption requirements
not being evicted, and so users are getting starved. Since slots are
dynamic, it's not even clear that eviction means anything, since slots get
destroyed and recreated, is that right? So I'm guessing there's another
paradigm to balance users out.

Thanks.  

Greg Langmead | Senior Research Scientist | SDL Language Weaver | (t) +1 310
437 7300

On 9/22/10 3:58 PM, "Greg Langmead" <glangmead@xxxxxxxxxxxxxxxxxx> wrote:

> We have a pool of about 1000 cpu (Fedora Core 12) being managed with Condor
> 7.4.2 with dynamic provisioning. The queue is busy today and I'm not liking
> what I'm seeing. I have three users each with a long backlog of idle jobs:
> 
> - user1 has lots of jobs with request_memory = 8000, request_cpus = 1, and
> priority 112
> - user2 has lots of jobs with request_memory = 6000, request_cpus = 1, and
> priority 5
> - user3 has lots of jobs with request_memory = 4000, request_cpus = 1, and
> priority 32
> 
> Most machines have 32G of RAM with 16 actual cores (but one partitionable
> Condor slot). My eviction settings (honed over 3 years of usage in the static
> provisioning environment) are:
> 
> PREEMPTION_REQUIREMENTS = ( (CurrentTime - EnteredCurrentState) > (6 * (60 *
> 60))) && ( RemoteUserPrio > (SubmitterPrio * 1.2 ))
> 
> i.e., let jobs run for 6 hours, after which they can be evicted by a user with
> better priority by a factor of 1.2.
> 
> Here's what I'm observing:
> 
> - the user with request_memory = 8000 is getting jobs served every negotiation
> cycle
> - the user with request_memory = 6000 is starved out
> - the user with request_memory = 4000 is getting jobs served every negotiation
> cycle
> 
> Moreover the 8G jobs may run indefinitely and are never evicted. Many have run
> for over 8 hours.
> 
> So my question is, how is eviction done with dynamic provisioning? Is a 6G job
> even compared to an 8G one to see if might preempt it? Also, when a new
> negotiation cycle is started, how could an 8G job from a user with terrible
> priority get run when a 6G job from a user with better priority does not? I
> can understand why a 4G job might slip in when a 6G one doesn't fit, so it's
> the 8G versus 6G competition that is not working.
> 
> Many thanks,
> Greg
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
> "http://www.apple.com/DTDs/PropertyList-1.0.dtd";>
> <plist version="1.0">
> <dict>
> <key>date-sent</key>
> <real>1285185483</real>
> <key>flags</key>
> <integer>261121</integer>
> <key>original-mailbox</key>
> <string>ews://glangmead@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/Condor</string>
> <key>remote-id</key>
> <string>AAMkADJiYzZjMzFkLWQ0NjUtNDlmNy1iNDQyLWMzMjk0YjI3NDNhZABGAAAAAACUPc06Te
> 1yRovg4Rp8Hy/mBwDDrWRkrIHcSrFP/qMTkYZqAHoNBUp8AADDrWRkrIHcSrFP/qMTkYZqAHoNBYto
> AAA=</string>
> <key>sender</key>
> <string>Greg Langmead &lt;glangmead@xxxxxxxxxxxxxxxxxx&gt;</string>
> <key>subject</key>
> <string>[Condor-users] Eviction and dynamic provisioning</string>
> <key>to</key>
> <string>Condor-Users Mail List &lt;condor-users@xxxxxxxxxxx&gt;</string>
> </dict>
> </plist>


SDL PLC confidential, all rights reserved.
If you are not the intended recipient of this mail SDL requests and requires that you delete it without acting upon or copying any of its contents, and we further request that you advise us.
SDL PLC is a public limited company registered in England and Wales.  Registered number: 02675207.
Registered address: Globe House, Clivemont Road, Maidenhead, Berkshire SL6 7DY, UK.