[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Singularity container creation failed: can't remount /user/path: no such file or directory



Bryce:


I don't think this is the problem, but be aware that we've seen problems with singularity when we've explicitly set the :rw permissions for a mount.  Some versions seem to only work with either :ro or nothing (to mean :rw).


This error looks like a recursive mount.  If you remove the getenv = true and the SINGULARITY_PWD, does it work?


-greg

On 9/18/20 9:04 AM, Cousins, Bryce S wrote:
Hello,

I run a HTCondor pool v8.8.10 and would like to enable user-defined Singularity images for submitted jobs, but when submitting test jobs I'm running into issues of the form:

    FATAL:   container creation failed: mount ->/user/path error: can't remount /user/path: no such file or directory

The Singularity image itself is fine, and executes without issues on the login or compute nodes; the error occurs only when submitting an HTCondor Singularity job.

I set up the Singularity compute nodes with the following configuration, based on the Singularity Support docs:

# /etc/condor/config.d/70-singularity.conf
SINGULARITY_JOB = !isUndefined(TARGET.SingularityImage)
SINGULARITY_IMAGE_EXPR = TARGET.SingularityImage
SINGULARITY_TARGET_DIR = /srv
SINGULARITY_BIND_EXPR = "/cvmfs,/ligo/home/ligo.org:/ligo/home/ligo.org:rw,/localscratch:/localscratch:rw"
SINGULARITY_IS_SETUID = false

HAS_SINGULARITY = HasSingularity
STARTD_ATTRS = $(STARTD_ATTRS),HAS_SINGULARITY

A test submit file is:

# /ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor/test.sub
universe = vanilla
executable = /ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor/containerInfo.sh
getenv = True
environment = "SINGULARITY_PWD=/ligo/home/ligo.org/bryce.cousins/git.ligo/gstlal/tacc"
+SingularityImage = "/ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor/gstlal.simg"
error = $(cluster)-$(process).err
queue 1

Submitting this job leads to an error:
FATAL:   container creation failed: mount ->/ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor error: can't remount /ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor: no such file or directory

I'm not sure the root cause, since the `/ligo/home/ligo.org/` NFS directory is bound in the compute node config. Other changes I have tried that still cause the same FATAL error:
  • binding the full path on the compute node, which warns "destination is already in the mount point list"
  • removing the environment variables from the submit file
Is there some other configuration change (either in the .sub file or on the compute node) that would work?

Thank you for any guidance.

Bryce

-----

Bryce Cousins

LIGO R&D Engineer

Penn State Institute for Computational and Data Sciences

bfc5288@xxxxxxx

814-867-3035


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/