[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] assigning multiple GPUs to a single slot

Since you are not using partitionable slots, the GPUs will be permanently assigned to slots when the STARTD starts.

If you have only 2 gpus and want 2 slots, Each slot will be assigned only a single GPU â so jobs that want more than

1 GPU will not match any of your slots. 


You can instead both GPUs to 1 of your two slots so that slot will match jobs that want 1 or 2 GPUs.


SLOT_TYPE_1 = cpus=1, GPUs=2, mem=auto


SLOT_TYPE_2 = cpus=1, mem=auto



Or you can switch to using partitionable slots, and let HTCondor decide how to divide up resources based on

What the jobs request.  Be aware that if you do this, the 1 GPU jobs will tend to dominate (if you have an infinite

supply of them), since once a 1 GPU job starts the remainder of the partitionable slot will only match 1 GPU jobs.




From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Francisco Pereira
Sent: Monday, January 18, 2016 1:39 PM
To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] assigning multiple GPUs to a single slot




I have been scheduling GPU jobs in our cluster by 


1) setting in the config file for each node


"use feature: GPUs"



(as suggested in the documentation, running condor_gpu_discovery -properties manually produces the right results for each machine)


2) setting up a number of slots with 1 CPU each, e.g. in a 2-GPU machine.


"SLOT_TYPE_1 = cpus=1,mem=auto




When submitting jobs that have "request_GPUs=1" in the submit file the jobs get scheduled to machines that have a GPU, and there are no more jobs being scheduled than there are GPUs, across multiple machines. However, when I specify "request_GPUs=2", the job stays in the queue with status "I", even though the requested number is available.


Hence, I am wondering what I am doing wrong and whether I have incorrectly set up the basic mechanism in #2. The GPU discovery works beautifully, so I suspect I am overcomplicating ... 


thank you for your help!