[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_ssh_to_job inside containers with GPU support



On 4/22/20 10:07 AM, Kenyi Hurtado Anampa wrote:

Hello,

We are submitting condor jobs that use singularity containers. The startds use the --nv feature, in order to bring GPU support inside the containers for Machine Learning applications:

SINGULARITY_EXTRA_ARGUMENTS = --nv
SINGULARITY_JOB = !isUndefined(TARGET.SingularityImage)
SINGULARITY_IMAGE_EXPR = TARGET.SingularityImage

This works great, however, when we use condor_ssh_to_job, we lose the environment related to libcuda (what --nv does), see [1]. Could it be that condor does not use --nv when entering the container?


Hi Kenyi:


When condor_ssh_to_job lands on a singularity job, it ends up calling /usr/bin/nsenter to enter the container. This is because singularity provides no good way for a random process to enter another container using just the singularity tools. nsenter enters the mount namespace of the singularity container, which is what I thought that --nv setup.

-greg