[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_ssh_to_job broken with 8.8 on CentOS 7



Dear HTCondor experts, dear Greg,

trying a dirty hack to replace "-a" with "-m -u -i -n -p -U" still makes things fail miserably,
since Singularity has somehow already exited when nsenter is called:
----------------------------------------------------
condor    2967  3.3  0.0  90236  8956 ?        Ss   17:59   0:00      \_ condor_starter -f -a sloti_2_1 cip000.physik.uni-bonn.de
freyermu  3012  0.0  0.0 125228  4684 ?        SNs  17:59   0:00          \_ sshd: freyermu [priv]
freyermu  3015  0.0  0.0 125228  1804 ?        SN   17:59   0:00          |   \_ sshd: freyermu@pts/2
freyermu  3016  0.0  0.0  56000  4588 pts/2    SNs+ 17:59   0:00          |       \_ /usr/bin/condor_docker_enter
root      3018  0.0  0.0 115312  1568 ?        S    17:59   0:00          \_ /bin/bash -x /usr/bin/nsenter -a -t 2989 /usr/sbin/chroot --userspec 67803 /proc/2989/root
root      3023  0.0  0.0 166136  2452 ?        R    17:59   0:00              \_ ps faux
----------------------------------------------------
In the logs, I can only find:
----------------------------------------------------
Feb 26 17:59:13 wn022.baf.physik.uni-bonn.de condor_starter[2967]: singularity enter_ns returned pid 3018
Feb 26 17:59:13 wn022.baf.physik.uni-bonn.de condor_starter[2967]: Process exited, pid=2989, status=255
----------------------------------------------------

I'll start on a downgrade of all workernodes for now and hope that 8.8.1 on Central Managers and Submitd machines
can talk fine with 8.6.13 startd machines.

Cheers,
	Oliver

Am 26.02.19 um 17:04 schrieb Oliver Freyermuth:
Dear HTCondor experts, dear Greg,

we have just now upgraded our production environment to 8.8.1 and interactive jobs with Singularity don't work anymore :-(.

The message we get is:
/usr/bin/nsenter: invalid option -- 'a'

It seems that when introducing the use of nsenter (which is a really good improvement!):
https://github.com/htcondor/htcondor/commit/13198d4efef976689fdd2f102f6aa213fa2f4659#diff-973dfd62df816055fedb382b2307a7af
the parameter "-a" was hardcoded, which is not yet supported in the version of nsenter shipped with CentOS 7.6.

Do you have any suggestion for a simple workaround?
Otherwise, we have to downgrade again (sadly we don't have a full-fledged test-setup, otherwise we would have found that before).

Cheers and all the best,
 ÂÂÂÂOliver



--
Oliver Freyermuth
UniversitÃt Bonn
Physikalisches Institut, Raum 1.047
NuÃallee 12
53115 Bonn
--
Tel.: +49 228 73 2367
Fax:  +49 228 73 7869
--

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature