[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] strict dynamic slot matching

You have the right basic idea, but you need a Requirements _expression_ that matches both the partitionable slot and the dynamic slot.   The partitionable slot will often have a GPUs and TotalSlotGPUs that is more than RequestGPUs and you still want to match that, so you need to have your _expression_ apply only to the dynamic slot


Like this


   Requirements = TARGET.DynamicSlot is undefined || TotalSlotGPUs == RequestGPUs

Or this, which also takes static slots into account


   Requirements = IfThenElse(PartitionableSlot is undefined, GPUs == RequestGPUs, GPUs >= RequestGPUs)

but it might be better to do this as configuration on the execute node.

START = DynamicSlot is undefined || RequestGPUs is undefined || GPUs == RequestGPUs




From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Pezzarossi, Gianni
Sent: Wednesday, May 4, 2022 12:16 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] strict dynamic slot matching


Hey everyone,

I had an idea that I was more curious if it world work rather than if it is a good idea.

We have some users that complain that sometimes their job that requests 1 GPU will be matched to a slot with 2 GPUs and it happens very intermittently. I suspect this is simply due to jobs ending before CLAIM_WORKLIFE has expired, meaning the dynamic slot is free to pick up a new job. If the original claim was for 2 GPUs, and a 1GPU job is waiting in the queue, the matchmaker calls it good enough, and allows the job to run (as I assume requesting resources is more of a “the slot must have at least this much” and not a “must have exactly this much”.


I can see why this is done that way as it helps throughput, allow the most jobs to run, rather than try and optimize resource usage (correct me if I’m wrong).


For the sake of argument though, I was wondering how you could force it to a kind of “match with exactly the number of GPUs I requested”. Am I wrong in thinking that a dynamic slot has the classad of TotalSlotGPUs, so a requirements statement in the submission file of something like:


Requirements = TotalSlotGPUs == Requestgpus


Would only match on slots with exactly the requested number of GPUs in order to avoid GPUs sitting idle?


Is there any downside to doing this aside from the impact to throughput of jobs?



Gianni Pezzarossi

Computational System Analyst

Research Services

Engineering IT Shared Services

University of Illinois @ Urbana-Champaign