[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Partitionable Slots



Todd -

Thanks for the suggestions, changin the requirements seems more
directly effective to me.   And, I think you are right about
user-preemption, since this only seems to occur with competing users.
We are giving it a try and will see what happens...

Doug


On 6/19/2014 11:53 AM, Douglas Thain wrote:
> Howdy -
>
> We have been using partitionable slots to run multi-core jobs for the
> last few months.  We are set up to have a single partitionable slot
> and no static slots, divided by CPU.   Our users are submitting a mix
> of jobs, using request_cpus to select the size of slot desired.
>
> When initially turned on, it works.  Slots get created for the exact
> size of each job, so that, for example, a two-core job is matched to a
> two-core slot.  However, after a while, jobs begin to be matched in
> slots that are too big.  For example, we see lots of one-cpu jobs
> running on 4-cpu slots.
>
> How do we fix this so that jobs only run in slots of the appropriate size?
>
> (Based on some previous discussions, we set CLAIM_WORKLIFE=0, so as to
> force claims to expire at the end of each job, thus causing them to be
> returned to the parent partitionable slot.  But, that doesn't seem to
> be happening.)
>
> The relevant configuration is:
>
> NUM_SLOTS = 1
> NUM_SLOTS_TYPE_1 = 1
> SLOT_TYPE_1 = cpus=100%
> SLOT_TYPE_1_PARTITIONABLE = true
> CLAIM_WORKLIFE = 0
>
> Any suggestions?
>

Hi Doug -

I tried the above config on my Windows 7 laptop using the v8.2.0 release
candidate and everything seemed to work as expected, i.e. with
CLAIM_WORKLIFE = 0 the claim expired at the end of the job and thus the
dynamic slot was returned to the parent partitionable slot.  When I
commented out the CLAIM_WORKLIFE=0 line, then the dynamic slot was reused.

Besides the CLAIM_WORKLIFE = 0 trick (that is a condor_startd knob, btw,
maybe you only set it in your central manager), you could also put the
following into your job requirements expression:

   requirements = DynamicSlot =!= True || Cpus =?= RequestCpus

The above will allow the job to match any static or partitionable slot,
but if the slot is a dynamic slot, it will only re-use the dynamic slot
if it has the same number of CPUs.  I like this better than the
CLAIM_WORKLIFE workaround because you still get the advantages of
reusing claims.

Of course, you could inject the above into all jobs via
APPEND_REQUIREMENTS in the config file, or you could opt to put the
above constraint into the startd START expression to enforce this "exact
CPU fit" policy on the startd side.

Hope the above helps,
Todd



> Doug
>
> P.S. We have a neat little display that shows slot size based on CPU:
> http://condor.cse.nd.edu/condor_matrix.cgi
> ______________________________
_________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>


--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685