I don't think this is the problem, but be aware that we've seen problems with singularity when we've explicitly set the :rw permissions for a mount. Some versions seem to only work with either :ro or nothing (to mean :rw).
This error looks like a recursive mount. If you remove the getenv = true and the SINGULARITY_PWD, does it work?
I run a HTCondor pool v8.8.10 and would like to enable user-defined Singularity images for submitted jobs, but when submitting test jobs I'm running into issues of the form:
FATAL: container creation failed: mount ->/user/path error: can't remount /user/path: no such file or directory
The Singularity image itself is fine, and executes without issues on the login or compute nodes; the error occurs only when submitting an HTCondor Singularity job.
I set up the Singularity compute nodes with the following configuration, based on the Singularity Support docs:
SINGULARITY_JOB = !isUndefined(TARGET.SingularityImage)SINGULARITY_IMAGE_EXPR = TARGET.SingularityImageSINGULARITY_TARGET_DIR = /srvSINGULARITY_BIND_EXPR = "/cvmfs,/ligo/home/ligo.org:/ligo/home/ligo.org:rw,/localscratch:/localscratch:rw"SINGULARITY_IS_SETUID = false
HAS_SINGULARITY = HasSingularitySTARTD_ATTRS = $(STARTD_ATTRS),HAS_SINGULARITY
A test submit file is:
universe = vanillaexecutable = /ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor/containerInfo.shgetenv = Trueenvironment = "SINGULARITY_PWD=/ligo/home/ligo.org/bryce.cousins/git.ligo/gstlal/tacc"+SingularityImage = "/ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor/gstlal.simg"error = $(cluster)-$(process).errqueue 1
Submitting this job leads to an error:FATAL: container creation failed: mount ->/ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor error: can't remount /ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor: no such file or directory
I'm not sure the root cause, since the `/ligo/home/ligo.org/` NFS directory is bound in the compute node config. Other changes I have tried that still cause the same FATAL error:
- binding the full path on the compute node, which warns "destination is already in the mount point list"
- removing the environment variables from the submit fileIs there some other configuration change (either in the .sub file or on the compute node) that would work?
Thank you for any guidance.
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/