[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Docker universe + GPU (CUDA)+pytorch



Hi Joseph. 
You should add it to DOCKER_EXTRA_ARGUMENTS on the executer machine. 
If i recall correctly you should install some nvidia docker extension to have thia feature. 
One last thing use nvidia docker image as a base docker image. There is some environment variables allready in the image

I haven't done this for a long time but i have this working at my cluster. 
So it should be ok. 

David


Get Outlook for Android


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Fulem Josef <fulemj@xxxxxxxxxx>
Sent: Monday, November 1, 2021, 18:37
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] Docker universe + GPU (CUDA)+pytorch

Hello,

Currently I'm trying to use the docker container with htcondor docker universe to run an application which requires the usage of GPU (CUDA) - pytorch. 

When I do it via vanilla universe it works OK and the CUDA is available.

When I run this command: 
condor_status -constraint  '!isUndefined(DetectedGPUs)' -compact  -af CUDADeviceName DetectedGPUs

then this is the output:
GeForce RTX 2070 SUPER GPU-d4decf4f, GPU-2a518ecd

Also, I have this in my htcondor config: 
use feature : GPUs
GPU_DISCOVERY_EXTRA = -extra


So it looks like the condor_gpu_discovery works OK.

When I build my docker image and I run it with --gpus all or --gpus device=0 
the CUDA is available and the application running in the container can use it.

But when I run it (the same docker image) via htcondor by using docker universe the GPUs are not accessible even though the GPU is requested.

It looks like the docker run is missing the --gpus flag. Is it possible to pass this to the docker somehow?

Thank you very much for any suggestion or help.

Best Regards.

Josef