Re: [HTCondor-users] assigning multiple GPUs to a single slot

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

Since you are not using partitionable slots, the GPUs will be permanently assigned to slots when the STARTD starts.

If you have only 2 gpus and want 2 slots, Each slot will be assigned only a single GPU â so jobs that want more than

1 GPU will not match any of your slots.

You can instead both GPUs to 1 of your two slots so that slot will match jobs that want 1 or 2 GPUs.

SLOT_TYPE_1 = cpus=1, GPUs=2, mem=auto

NUM_SLOTS_TYPE_1 = 1

SLOT_TYPE_2 = cpus=1, mem=auto

NUM_SLOTS_TYPE_2 = 1

Or you can switch to using partitionable slots, and let HTCondor decide how to divide up resources based on

What the jobs request. Be aware that if you do this, the 1 GPU jobs will tend to dominate (if you have an infinite

supply of them), since once a 1 GPU job starts the remainder of the partitionable slot will only match 1 GPU jobs.

-tj

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Francisco Pereira
Sent: Monday, January 18, 2016 1:39 PM
To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] assigning multiple GPUs to a single slot

Hi,

I have been scheduling GPU jobs in our cluster by

1) setting in the config file for each node

"use feature: GPUs"

"GPU_DISCOVERY_EXTRA = -extra"

(as suggested in the documentation, running condor_gpu_discovery -properties manually produces the right results for each machine)

2) setting up a number of slots with 1 CPU each, e.g. in a 2-GPU machine.

"SLOT_TYPE_1 = cpus=1,mem=auto

SLOT_TYPE_1_PARTITIONABLE = FALSE

NUM_SLOTS_TYPE_1 = 2"

When submitting jobs that have "request_GPUs=1" in the submit file the jobs get scheduled to machines that have a GPU, and there are no more jobs being scheduled than there are GPUs, across multiple machines. However, when I specify "request_GPUs=2", the job stays in the queue with status "I", even though the requested number is available.

Hence, I am wondering what I am doing wrong and whether I have incorrectly set up the basic mechanism in #2. The GPU discovery works beautifully, so I suspect I am overcomplicating ...

thank you for your help!

Francisco

Mailing List Archives

Public Access

Re: [HTCondor-users] assigning multiple GPUs to a single slot