Bryce:
I don't think this is the problem, but be aware that we've seen problems with singularity when we've explicitly set the :rw permissions for a mount. Some versions seem to only work with either :ro or nothing (to mean :rw).
This error looks like a recursive mount. If you remove the getenv = true and the SINGULARITY_PWD, does it work?
-greg
Hello,
I run a HTCondor pool v8.8.10 and would like to enable user-defined Singularity images for submitted jobs, but when submitting test jobs I'm running into issues of the form:
FATAL: container creation failed: mount ->/user/path error: can't remount /user/path: no such file or directory
The Singularity image itself is fine, and executes without issues on the login or compute nodes; the error occurs only when submitting an HTCondor Singularity job.
I set up the Singularity compute nodes with the following configuration, based on the Singularity Support docs:
# /etc/condor/config.d/70-singularity.conf
SINGULARITY_JOB = !isUndefined(TARGET.SingularityImage)SINGULARITY_IMAGE_EXPR = TARGET.SingularityImageSINGULARITY_TARGET_DIR = /srvSINGULARITY_BIND_EXPR = "/cvmfs,/ligo/home/ligo.org:/ligo/home/ligo.org:rw,/localscratch:/localscratch:rw"SINGULARITY_IS_SETUID = false
HAS_SINGULARITY = HasSingularitySTARTD_ATTRS = $(STARTD_ATTRS),HAS_SINGULARITY
A test submit file is:
# /ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor/test.sub
universe = vanillaexecutable = /ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor/containerInfo.shgetenv = Trueenvironment = "SINGULARITY_PWD=/ligo/home/ligo.org/bryce.cousins/git.ligo/gstlal/tacc"+SingularityImage = "/ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor/gstlal.simg"error = $(cluster)-$(process).errqueue 1
Submitting this job leads to an error:FATAL: container creation failed: mount ->/ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor error: can't remount /ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor: no such file or directory
I'm not sure the root cause, since the `/ligo/home/ligo.org/` NFS directory is bound in the compute node config. Other changes I have tried that still cause the same FATAL error:
- binding the full path on the compute node, which warns "destination is already in the mount point list"
- removing the environment variables from the submit file
Is there some other configuration change (either in the .sub file or on the compute node) that would work?
Thank you for any guidance.
Bryce
-----
Bryce Cousins
LIGO R&D Engineer
Penn State Institute for Computational and Data Sciences
814-867-3035
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/