[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Whole System Scheduling



Hi Dan,

Thanks for the detailed notes on the recipe.  I'll go through again
with your suggested chages.  I thought I'd tried replacing the SUSPEND
statement, but it  may have been before I fixed another issue that was
giving me trouble.

Dynamic Provisioning sounds like a move in the right direction, but
I don't think I'll hang all my hopes there.

On Fri, Oct 23, 2009 at 11:35:05AM -0500, Dan Bradley wrote:
:
:
:Jonathan D. Proulx wrote:

:> My fondest wish would be for Condor to be able to allocate multiple CPUs and
:> jobs could simply require some number (which they could if I
:> configured a matrix of mutually exlusive slots I guess but as we get
:> up in to the world of 16 and more cores this gets crazy)
:>   
:Agreed.  This is the intention of the recently added dynamic slot support:
:
:http://www.cs.wisc.edu/condor/manual/v7.2/3_12Setting_Up.html#SECTION004127900000000000000
:
:However, this feature currently does not provide a good solution for 
:"defragmenting".  What I mean is that if there is a steady supply of 
:single-cpu jobs, then jobs requiring more than one cpu may never get 
:scheduled unless they are lucky and a bunch of single cpu jobs all exit 
:at the same time.  One workaround is to enforce a periodic drain so that 
:each execute node stops accepting more jobs until all slots are idle.

Periodic drain is a good idea,

An other issue I'm seeing with my 1/2 hr of experience using dynamic
slots is that they split slowly since the Partitionable slot only
matches once per negotiation cycle (about 5min on my test system) it
takes N * NegotiationCycle to fully populate an Nway system with
single processor jobs (or 40minutes for my system), this is alos less
than optimal.