[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Adding GPUs to machine resources



On Wed, Apr 16, 2014 at 01:31:08PM +0200, Steffen Grunewald wrote:
> On Wed, Mar 12, 2014 at 04:06:46PM +0100, Steffen Grunewald wrote:
> > 
> > Following http://spinningmatt.wordpress.com/2012/11/19, I have tried
> > to add two GPUs to the resources available to a standalone machine
> > with a number of CPU cores, by defining in condor_config.d/gpu:
> > 
> > MACHINE_RESOURCE_NAMES    = GPUS
> > MACHINE_RESOURCE_GPUS     = 2
> > 
> > SLOT_TYPE_1               = cpus=100%,auto
> > SLOT_TYPE_1_PARTITIONABLE = TRUE
> > NUM_SLOTS_TYPE_1          = 1
> > 
> > ENVIRONMENT_FOR_AssignedGpus = CUDA_VISIBLE_DEVICES
> 
> I'll procees with MACHINE_RESOURCE_INVENTORY_GPUS, and work my
> way through the rest of the configuration...

Successful.

If I request_gpus > 0, I get the corresponding resources assigned:
(2 jobs with request_gpus set to 0, 4 with 1, 2 with 2, still 
running)

$ grep -e CUDA_VIS -e Assig *.out
2.out:_CONDOR_AssignedGPUS=CUDA0
2.out:CUDA_VISIBLE_DEVICES=0
2.out:AssignedGPUS = "CUDA0"
3.out:_CONDOR_AssignedGPUS=CUDA1
3.out:CUDA_VISIBLE_DEVICES=1
3.out:AssignedGPUS = "CUDA1"
4.out:_CONDOR_AssignedGPUS=CUDA0
4.out:CUDA_VISIBLE_DEVICES=0
4.out:AssignedGPUS = "CUDA0"
5.out:_CONDOR_AssignedGPUS=CUDA1
5.out:CUDA_VISIBLE_DEVICES=1
5.out:AssignedGPUS = "CUDA1"
6.out:_CONDOR_AssignedGPUS=CUDA0,CUDA1
6.out:CUDA_VISIBLE_DEVICES=0,1
6.out:AssignedGPUS = "CUDA0,CUDA1"

(so this also works for request_gpus=2, fine)

But: If the user "forgets" to specify request_gpus (or sets it to 0),
then CUDA_VISIBLE_DEVICES isn't set *which apparently leaves full access
to _all_ GPU resources of the machine*. Is this intended? I'd expect 
something like CUDA_VISIBLE_DEVICES=-1 ...

Still running 8.1.4

- S