[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Incompatibility of HTCondor "condor_ssh_to_job" with Apptainer 1.3.0?



Dear HTCondor experts (probably Greg â hello from Bonn! ;-) ),

I finally came around upgrading a first system to Apptainer 1.3.0, which now uses fuse-overlayfs by default instead of the previous "underlay" approach which is going to be deprecated in a future release.

Trying to start an interactive job (or connecting to an existing job) now reveals (note: we run Apptainer unprivilegedly):
...
 Your condor job is running with pid(s) 34563.
 Can't open master pty Bad file descriptor
 read returned, exiting
...


I can pin this down to the following problem:


1) Process tree:

condor    34472  1.7  0.0  91048  8624 ?        Ss   22:18   0:00      \_ condor_starter -f -local-name slot_type_1 -a slot1_2 exp196.physik.uni-bonn.de
freyermu  34563  2.5  0.0 963708 19548 ?        SNsl 22:18   0:00          \_ Apptainer runtime parent
freyermu  34587  0.0  0.0 888016 17256 ?        SNl  22:18   0:00              \_ appinit
freyermu  34622  0.0  0.0   3800  1376 ?        SN   22:18   0:00              |   \_ /bin/sh -c sleep 180 && while test -d ${_CONDOR_SCRATCH_DIR}/.condor_ssh_to_job_1; do /bin/sleep 3; done
freyermu  34623  0.0  0.0   2376   364 ?        SN   22:18   0:00              |       \_ sleep 180
freyermu  34606  1.5  0.0  16200  3092 ?        SN   22:18   0:00              \_ /usr/libexec/apptainer/bin/fuse-overlayfs -f -o allow_other,lowerdir=/var/lib/apptainer/mnt/session/overlay-lowerdir:/var/lib/apptainer/mnt/session/rootfs...


2) Running the following (using any PID "deeper" down, e.g. 34622 or 34623, does the same)
    strace -f condor_nsenter -t 34587 -S <my_id> -G <_my_gid>
   reveals:
    open("/proc/34587/ns/uts", O_RDONLY)    = 3
    setns(3, 0)                             = 0
    close(3)                                = 0
    open("/proc/34587/ns/pid", O_RDONLY)    = 3
    setns(3, 0)                             = 0
    close(3)                                = 0
    open("/proc/34587/ns/mnt", O_RDONLY)    = 3
    setns(3, 0)                             = 0
    close(3)                                = 0
    setgroups(0, NULL)                      = 0
    setgid(513)                             = 0
    setuid(67803)                           = 0
    ioctl(0, TIOCGWINSZ, {ws_row=58, ws_col=236, ws_xpixel=1891, ws_ypixel=988}) = 0
    open("/dev/ptmx", O_RDWR)               = -1 EACCES (Permission denied)
    ioctl(-1, TIOCSPTLCK, [0])              = -1 EBADF (Bad file descriptor)
    write(2, "Can't open master pty Bad file d"..., 42Can't open master pty Bad file descriptor
    ) = 42
    exit_group(1)                           = ?
    +++ exited with 1 +++

I'm not sure what exactly makes the difference, but:
 nsenter -t 34587 -U -m -p -S <my_id> -G <_my_gid>
"works" and I can access /dev/ptmx inside.

SELinux is not at fault, no denials, and disabling it changes nothing.

Any ideas? Do others also see this issue?

Disabling fuse-overlayfs usage via the Apptainer configuration and forcing it back to use Underlay seems to fix the problem (enable overlay = no, enable underlay = yes),
but the Apptainer guys want to remove that implementation at some point.

Cheers,
	Oliver

--
Oliver Freyermuth
UniversitÃt Bonn
Physikalisches Institut, Raum 1.047
NuÃallee 12
53115 Bonn
--
Tel.: +49 228 73 2367
Fax:  +49 228 73 7869
--

Attachment: smime.p7s
Description: Kryptografische S/MIME-Signatur