[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] multi-gpu-nodes limit access per slot



On 12/10/2019 10:52 AM, Beyer, Christoph wrote:
> Hi,
> 
> I do have one 4 gpu node and wonder if there is a way to limit the usage on slot base, for ex 4 slots that just see & access each a single GPU. Are cgroups the way to do so and if yes how is it configured ?
> 
> Best
> Christoph
> 

Maybe on this node just configure HTCondor with four static slots, each 
with one GPU and some amount of CPU/RAM?  If you need partitionable 
slots for some reason (e.g. RAM), you could edit your START expression 
to say only jobs requesting 0 or 1 GPUs will be matched....

As for restricting access to the GPUs, HTCondor will set 
CUDA_VISIBLE_DEVICES environment variable (and the OpenCL equal) to 
point to the GPU provisioned to that slot. This environment variable is 
honored by low-level CUDA libraries.   Are you worried about GPU codes 
that purposefully ignore or clear this environment variable?

regards
Todd