[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Singularity container creation failed: can't remount



Hi Greg,

> This error looks like a recursive mount.? If you remove the getenv =
true and the SINGULARITY_PWD, does it work?


This was it -- removing those parameters allowed the container to run. Thanks for the help!

Bryce


-----

Bryce Cousins

LIGO R&D Engineer

Penn State Institute for Computational and Data Sciences

bfc5288@xxxxxxx

814-867-3035



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of htcondor-users-request@xxxxxxxxxxx <htcondor-users-request@xxxxxxxxxxx>
Sent: Wednesday, September 23, 2020 09:39
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Subject: HTCondor-users Digest, Vol 82, Issue 26
 
Date: Tue, 22 Sep 2020 14:50:21 -0500
From: Greg Thain <gthain@xxxxxxxxxxx>
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] Singularity container creation failed:
        can't remount /user/path: no such file or directory
Message-ID: <ab763418-4283-faf1-fcad-499d985274bc@xxxxxxxxxxx>
Content-Type: text/plain; charset="windows-1252"; Format="flowed"

Bryce:


I don't think this is the problem, but be aware that we've seen problems
with singularity when we've explicitly set the :rw permissions for a
mount.? Some versions seem to only work with either :ro or nothing (to
mean :rw).


This error looks like a recursive mount.? If you remove the getenv =
true and the SINGULARITY_PWD, does it work?


-greg

On 9/18/20 9:04 AM, Cousins, Bryce S wrote:
> Hello,
>
> I run a HTCondor pool v8.8.10 and would like to enable user-defined
> Singularity images for submitted jobs, but when submitting test jobs
> I'm running into issues of the form:
>
> ??? FATAL: ? container creation failed: mount ->/user/path error:
> can't remount /user/path: no such file or directory
>
> The Singularity image itself is fine, and executes without issues on
> the login or compute nodes; the error occurs only when submitting an
> HTCondor Singularity job.
>
> I set up the Singularity compute nodes with the following
> configuration, based on the Singularity Support docs
> <https://nam01.safelinks.protection.outlook.com/?url="">>:
>
> # /etc/condor/config.d/70-singularity.conf
> SINGULARITY_JOB = !isUndefined(TARGET.SingularityImage)
> SINGULARITY_IMAGE_EXPR = TARGET.SingularityImage
> SINGULARITY_TARGET_DIR = /srv
> SINGULARITY_BIND_EXPR =
> "/cvmfs,/ligo/home/ligo.org:/ligo/home/ligo.org:rw,/localscratch:/localscratch:rw"
> SINGULARITY_IS_SETUID = false
>
> HAS_SINGULARITY = HasSingularity
> STARTD_ATTRS = $(STARTD_ATTRS),HAS_SINGULARITY
>
> A test submit file is:
>
> # /ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor/test.sub
> universe = vanilla
> executable =
> /ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor/containerInfo.sh
> getenv = True
> environment =
> "SINGULARITY_PWD=/ligo/home/ligo.org/bryce.cousins/git.ligo/gstlal/tacc"
> +SingularityImage =
> "/ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor/gstlal.simg"
> error = $(cluster)-$(process).err
> queue 1
>
> Submitting this job leads to an error:
> FATAL: ? container creation failed: mount
> ->/ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor
> error: can't remount
> /ligo/home/ligo.org/bryce.cousins/workflows/singularity_condor: no
> such file or directory
>
> I'm not sure the root cause, since the `/ligo/home/ligo.org/` NFS
> directory is bound in the compute node config. Other changes I have
> tried that still cause the same FATAL error:
>
>   * binding the full path on the compute node, which warns
>     "destination is already in the mount point list"
>   * removing the environment variables from the submit file
>
> Is there some other configuration change (either in the .sub file or
> on the compute node) that would work?
>
> Thank you for any guidance.
>
> Bryce
>
> -----
>
> Bryce Cousins
>
> LIGO R&D Engineer
>
> Penn State Institute for Computational and Data Sciences
> <
https://www.icds.psu.edu/>
>
> bfc5288@xxxxxxx
>
> 814-867-3035
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://nam01.safelinks.protection.outlook.com/?url="">
>
> The archives can be found at:
>
https://nam01.safelinks.protection.outlook.com/?url="">