From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx]
On Behalf Of John M Knoeller
Sent: Tuesday, May 16, 2017 3:14 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Preventing HTCondor assignment of a given GPU based on GPU state policy?
You can configure
OFFLINE_MACHINE_RESOURCE_GPUS = CUDA0
to prevent HTCondor from assigning that GPU to a slot.
I’m working on getting a new exec node stood up with multiple GPUs, for use by job which need dedicated GPU assignment – a first in our pools. Other jobs I’ve dealt with had an internal lock and queue mechanism to be able to share all the
GPUs on the system, so I didn’t need to worry about HTCondor assignments.
I’d like to be able to prevent HTCondor from assigning a GPU that’s already in use by a non-HTCondor process to one of its jobs. I wrote a wrapper for nvidia-smi which pulls in an ad like so:
CUDA0FreeGlobalMemory = 2441
CUDA0UtilizationPct = 100
CUDA1FreeGlobalMemory = 4031
CUDA1UtilizationPct = 0
CUDA2FreeGlobalMemory = 4031
CUDA2UtilizationPct = 0
CUDA3FreeGlobalMemory = 4031
CUDA3UtilizationPct = 0
CUDAFreeGlobalMemory = 14534
CUDAUtilization = 25.0
(This might be a good addition to condor_gpu_discovery, a “-utilization” argument.)
So in the above case, I’d like to prevent any HTCondor job from being assigned the CUDA0 device since it’s 100% used, and preferably advertise one fewer GPU available on the system. Is there any means to do this? I’ve been mulling the kinds
of expressions I think I might need and my brain is starting to hurt a bit.
Michael V. Pelletier
Future Technologies & Cloud
Integrated Defense Systems
+1 978-858-9681 (office)
+1 339-293-9149 (cell)
7-225-9681 (tie line)
50 Apple Hill Drive
Tewksbury, MA 01876 USA
Follow Raytheon On
This message contains information that may be confidential and privileged. Unless you are the addressee (or authorized to receive mail for the addressee), you should not
use, copy or disclose to anyone this message or any information contained in this message. If you have received this message in error, please so advise the sender by reply e-mail and delete this message. Thank you for your cooperation.