[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Docker universe + GPU (CUDA)+pytorch

Hi Joseph. 
You should add it to DOCKER_EXTRA_ARGUMENTS on the executer machine. 
If i recall correctly you should install some nvidia docker extension to have thia feature. 
One last thing use nvidia docker image as a base docker image. There is some environment variables allready in the image

I haven't done this for a long time but i have this working at my cluster. 
So it should be ok. 


Get Outlook for Android

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Fulem Josef <fulemj@xxxxxxxxxx>
Sent: Monday, November 1, 2021, 18:37
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] Docker universe + GPU (CUDA)+pytorch


Currently I'm trying to use the docker container with htcondor docker universe to run an application which requires the usage of GPU (CUDA) - pytorch. 

When I do it via vanilla universe it works OK and the CUDA is available.

When I run this command: 
condor_status -constraint  '!isUndefined(DetectedGPUs)' -compact  -af CUDADeviceName DetectedGPUs

then this is the output:
GeForce RTX 2070 SUPER GPU-d4decf4f, GPU-2a518ecd

Also, I have this in my htcondor config: 
use feature : GPUs

So it looks like the condor_gpu_discovery works OK.

When I build my docker image and I run it with --gpus all or --gpus device=0 
the CUDA is available and the application running in the container can use it.

But when I run it (the same docker image) via htcondor by using docker universe the GPUs are not accessible even though the GPU is requested.

It looks like the docker run is missing the --gpus flag. Is it possible to pass this to the docker somehow?

Thank you very much for any suggestion or help.

Best Regards.