[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] strict dynamic slot matching



Just had another thought, this would not enforce the strict matching for a job that say did not request any GPUs, i.e RequestGPUs = undefined.

 

For the requirements solution, it seems to me that it would then evaluate to UNDEFINED which I am unsure what happens with an undefined requirements. Is it ignored or treated as not a match?

 

For the startd config, it is handled if RequestGPUs is undefined, but would still allow a non-gpu job to possibly run on a gpu slot.

 

The easiest solution to avoid any of this I could think of would be to add a job transform to make any jobs where RequestGPUs is undefined, to RequestGPUs = 0

 

 

-------------------------------------

Gianni Pezzarossi

Computational System Analyst

Research Services

Engineering IT Shared Services

University of Illinois @ Urbana-Champaign

(217)244-7549

engrit-help@xxxxxxxxxxxx

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Pezzarossi, Gianni
Sent: Wednesday, May 4, 2022 4:58 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] strict dynamic slot matching

 

Ah! Good point. Forgot about the partitionable slot.

 

Thanks TJ!

 

-------------------------------------

Gianni Pezzarossi

Computational System Analyst

Research Services

Engineering IT Shared Services

University of Illinois @ Urbana-Champaign

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of John M Knoeller
Sent: Wednesday, May 4, 2022 3:07 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] strict dynamic slot matching

 

You have the right basic idea, but you need a Requirements _expression_ that matches both the partitionable slot and the dynamic slot.   The partitionable slot will often have a GPUs and TotalSlotGPUs that is more than RequestGPUs and you still want to match that, so you need to have your _expression_ apply only to the dynamic slot

 

Like this

 

   Requirements = TARGET.DynamicSlot is undefined || TotalSlotGPUs == RequestGPUs

Or this, which also takes static slots into account

 

   Requirements = IfThenElse(PartitionableSlot is undefined, GPUs == RequestGPUs, GPUs >= RequestGPUs)

but it might be better to do this as configuration on the execute node.

START = DynamicSlot is undefined || RequestGPUs is undefined || GPUs == RequestGPUs

 

-tj

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Pezzarossi, Gianni
Sent: Wednesday, May 4, 2022 12:16 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] strict dynamic slot matching

 

Hey everyone,

I had an idea that I was more curious if it world work rather than if it is a good idea.

We have some users that complain that sometimes their job that requests 1 GPU will be matched to a slot with 2 GPUs and it happens very intermittently. I suspect this is simply due to jobs ending before CLAIM_WORKLIFE has expired, meaning the dynamic slot is free to pick up a new job. If the original claim was for 2 GPUs, and a 1GPU job is waiting in the queue, the matchmaker calls it good enough, and allows the job to run (as I assume requesting resources is more of a “the slot must have at least this much” and not a “must have exactly this much”.

 

I can see why this is done that way as it helps throughput, allow the most jobs to run, rather than try and optimize resource usage (correct me if I’m wrong).

 

For the sake of argument though, I was wondering how you could force it to a kind of “match with exactly the number of GPUs I requested”. Am I wrong in thinking that a dynamic slot has the classad of TotalSlotGPUs, so a requirements statement in the submission file of something like:

 

Requirements = TotalSlotGPUs == Requestgpus

 

Would only match on slots with exactly the requested number of GPUs in order to avoid GPUs sitting idle?

 

Is there any downside to doing this aside from the impact to throughput of jobs?

 

-------------------------------------

Gianni Pezzarossi

Computational System Analyst

Research Services

Engineering IT Shared Services

University of Illinois @ Urbana-Champaign