I had an idea that I was more curious if it world work rather than if it is a good idea.
We have some users that complain that sometimes their job that requests 1 GPU will be matched to a slot with 2 GPUs and it happens very intermittently. I suspect this is simply due to jobs ending before CLAIM_WORKLIFE has expired, meaning
the dynamic slot is free to pick up a new job. If the original claim was for 2 GPUs, and a 1GPU job is waiting in the queue, the matchmaker calls it good enough, and allows the job to run (as I assume requesting resources is more of a “the slot must have at
least this much” and not a “must have exactly this much”.
I can see why this is done that way as it helps throughput, allow the most jobs to run, rather than try and optimize resource usage (correct me if I’m wrong).
For the sake of argument though, I was wondering how you could force it to a kind of “match with exactly the number of GPUs I requested”. Am I wrong in thinking that a dynamic slot has the classad of TotalSlotGPUs, so a requirements statement
in the submission file of something like:
Requirements = TotalSlotGPUs == Requestgpus
Would only match on slots with exactly the requested number of GPUs in order to avoid GPUs sitting idle?
Is there any downside to doing this aside from the impact to throughput of jobs?
Computational System Analyst
Engineering IT Shared Services
University of Illinois @ Urbana-Champaign