[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Partitionable Slots




On 6/19/2014 11:53 AM, Douglas Thain wrote:
(Based on some previous discussions, we set CLAIM_WORKLIFE=0, so as to
force claims to expire at the end of each job, thus causing them to be
returned to the parent partitionable slot.  But, that doesn't seem to
be happening.)


One thought on why CLAIM_WORKLIFE=0 may not give the desired result you wanted -

Dynamic slots are returned back to their parent partitionable slot whenever a dynamic slot enters the UNCLAIMED state. By setting CLAIM_WORKLIFE=0, you ensure that a slot will go back to UNCLAIMED state when a running job completes.

But perhaps your pool allows user preemption? So imagine userA submits a 4-cpujob, resulting in dynamic slot with 4-cpus. Now along comes userB, and say userB has a better priority such that userB can preempt slots owned by userB, and your policy allows such preemptions to occur. In this situation, the dynamic slot never enters UNCLAIMED state - it just goes from claimed by userA to claimed by userB. You could edit PREEMPTION_RANK so that when such preemption occurs it tries to do a best-fit on cpu cores....

Meanwhile, I think we'll look into changing the code so the default requirements of the job will have
  requirements = (whatever was there before) &&  \\
          (DynamicSlot =!= True || Cpus =?= RequestCpus)
... or doing something equivalent on the startd. Seems like a policy most people would want/expect, thus seems reasonable for a built-in default... people can always override with something different...

hope this helps,
Todd