Currently I'm trying to use the docker container with htcondor docker universe to run an application which requires the usage of GPU (CUDA) - pytorch.Â
When I do it via vanilla universe it works OK and the CUDA is available.
WhenÂI run this command:Â
condor_status -constraint Â'!isUndefined(DetectedGPUs)' -compact Â-af CUDADeviceName DetectedGPUs
then this is the output:
GeForce RTX 2070 SUPER GPU-d4decf4f, GPU-2a518ecd
Also, I have this in my htcondor config:Â
use feature : GPUs
GPU_DISCOVERY_EXTRA = -extra
So it looks like the condor_gpu_discovery works OK.
When I build my docker image and I run it with --gpus all or --gpus device=0Â
the CUDA is availableÂand the application running in the container can use it.
But when I run it (the same docker image) via htcondor by using docker universe the GPUs are not accessible even though the GPU is requested.
It looks like the docker run is missing the --gpus flag. Is it possible to pass this to the docker somehow?
Thank you very much for any suggestion or help.