[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Dynamic slots and concurrency...



On 11/5/2015 3:40 PM, Jody Pearson wrote:
> Hi all,
> 
> I'm wondering if concurrency limits are supposed to word with dynamic 
> slots ?
> 

Yes, BUT you will likely want to configure HTCondor to use "Consumption Policies", which are not enabled by default.  See 
  https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=ConsumptionPolicies
for more information.

The short story is HTCondor has two different methods it can use to assign dynamic slots quickly.  

The first method involves the condor_negotiator giving a schedd an entire machine (ie all the resources in a partitionable slot), and then the schedd splits that partitionable slot into a bunch of dynamic slots without any additional help from the negotiator on the central manager.  The advantage of this method is scalability, as less work needs to be performed on the single central manager node because instead this work is pushed out to the schedd (and you can always add more schedd services to your pool).  The disadvantage is unexpected behavior can result if you are using concurrency limits. This method is the default, and is enabled by setting CLAIM_PARTITIONABLE_LEFTOVERS = True.

The second method involves the condor_negotiator splitting up a partitionable slot into dynamic slots, and then handing these dynamic slots one by one to the schedd.  An advantage here is concurrency limits should work as expected, and the disadvantage is scalability because the negotiator now has to do more work.  This method is enabled by setting CONSUMPTION_POLICY=True on your execute nodes. If your pool has ~10k slots or less, I wouldn't worry much about the scalability disadvantage of CONSUMPTION_POLICY=True.  

You will want to add the following to your condor_config file(s) on your execute nodes (or simply make the change on all machines in your pool):

  # Disable CLAIM_PARTITIONABLE_LEFTOVERS and instead enable
  # Consumption Policies so that concurrency limits behavior
  # works as expected.
  CLAIM_PARTITIONABLE_LEFTOVERS = False
  CONSUMPTION_POLICY = True

And Jody, to keep your same policy you had for rounding off the 
requested memory and cpus, you will also want to add to the condor_config on your execute nodes:

  # When using Consumption Policies, the syntax / knob names for 
  # modifying resources requests at the startd is different; these
  # settings are the Consumption Policy equivalent of 
  # the MODIFY_REQUEST_* knobs.
  # give out memory in chunks of 5371
  CONSUMPTION_MEMORY = quantize(RequestMemory,5371)
  # modify cpu to be based on percentage of memory
  CONSUMPTION_CPUS = quantize(RequestCpus,ceiling(real(RequestMemory)/128905 *24))

After you make the above changes, you will need to restart the condor_startd daemons,
as unfortunately I don't think a reconfig will do it.  So, for instance, if you 
are impatient and don't care about restarting jobs currently running, you could enter
from your central manager something like :
   condor_restart -fast -startd -all

Now at this point your concurrently limits should work the same on your static and partitionable slot machines.

Hope the above helps,
regards,
Todd