[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_ssh_to_job inside containers with GPU support

On 4/22/20 10:07 AM, Kenyi Hurtado Anampa wrote:


We are submitting condor jobs that use singularity containers. The startds use the --nv feature, in order to bring GPU support inside the containers for Machine Learning applications:

SINGULARITY_JOB = !isUndefined(TARGET.SingularityImage)

This works great, however, when we use condor_ssh_to_job, we lose the environment related to libcuda (what --nv does), see [1]. Could it be that condor does not use --nv when entering the container?

Hi Kenyi:

When condor_ssh_to_job lands on a singularity job, it ends up calling /usr/bin/nsenter to enter the container. This is because singularity provides no good way for a random process to enter another container using just the singularity tools. nsenter enters the mount namespace of the singularity container, which is what I thought that --nv setup.