[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Selecting or partitioning GPUs



We have a single machine with 40 CPU cores, 188 GiB of memory and 10 GPUs:

Ideally I would like to configure this machine to be a single partitionable slot, where cores, memory and GPUs are allocated as needed into dynamic slots.

Right now, this is the content of /etc/condor/config.d/30-dynamic:

NUM_SLOTS=1
NUM_SLOTS_TYPE_1=1
SLOT_TYPE_1=100%
SLOT_TYPE_1_PARTITIONABLE=true
JOB_DEFAULT_REQUESTMEMORY=4800M

And this is the content of /etc/condor/config.d/60-dynamic:

@use feature : GPUs
GPU_DISCOVERY_EXTRA = -extra -properties

This mostly works, and everything is allocated accordingly, but it is impossible to select based on CUDADeviceName or CUDACapability because they are all heterogeneous and several attributes are assigned to the same slot (CUDA0Capability, CUDA1Capability, CUDA2Capability,...). Is there a way to specify in the submit file a specific capability or device name while keeping it a single partitionable slot?

In case it is not possible I would like to create 3 different partitionable slots, each one including GPUs of the same type. How can I achieve this?

Thank you!

We are using HTCondor 8.8