[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_ssh_to_job inside containers with GPU support



Hello Oliver and Greg,


  Â# Work around missing PTY
  Âscript /dev/null
  Â# Re-set home directory (of course, this needs to be adapted):
  Âexport HOME=/jwd
  Â# Re-source /etc/profile:
  Âsource /etc/profile
  Â# Fixup TERM
  Âexport TERM=linux
  Â# And here's the magic trick for CUDA:
  Âexport LD_LIBRARY_PATH=/.singularity.d/libs/

The explanation for the last line is: Singularity binds the CUDA libraries (and also other stuff like AMD libraries if you use that) to
/.singularity.d/libs/ inside the container. Hence, you need to adjust the LD_LIBRARY_PATH to find that. That's injected into the environment
when singularity starts up processes inside, but nsenter has no way to extract that.

Now I guess you are asking as a cluster adnim, and not as a user, right?
As a cluster admin, what we did was to put all the "export"-fixup into /etc/profile.d/somefile. This means our users only need to do:
Âscript /dev/null
Âsource /etc/profile
and they are good to go.


Flawless victory! Thank you so much for the explanation and the workaround, you just saved us so much time digging into this.

Best regards,
Kenyi