[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_ssh_to_job broken with 8.8 on CentOS 7



(re-sent to the list)

Hi together,

to summarize, this is now my collection of modifications, and the remaining issues on CentOS 7,
using Singularity with setuid root bit on with HTCondor 8.8.1.

/usr/bin/nsenter ("real" nsenter is renamed to nsenter.real):
----------------------------------------------------------
#!/bin/bash
ARGS=$(echo "$@" | sed 's/-a/-m -i -p/')
export SHELL=/bin/bash
export PATH=$PATH:/bin
exec /usr/bin/nsenter.real $ARGS
----------------------------------------------------------

In /usr/libexec/condor/condor_ssh_to_job_shell_setup , comment out the following lines:
----------------------------------------------------------
# kill the dummy sleep job if this is an interactive job
#if grep -q '^InteractiveJob = true' "${_CONDOR_SCRATCH_DIR}/.job.ad"; then
#  if [ "${_CONDOR_JOB_PIDS}" != "" ]; then
#    kill "${_CONDOR_JOB_PIDS}" 2>/dev/null
#       _CONDOR_JOB_PIDS=""
#  fi
#fi
----------------------------------------------------------

With this, submitting interactive jobs or attaching to running jobs works, with the following remaining errors:
----------------------------------------------------------
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
----------------------------------------------------------

Also, this is not too useful for interactive jobs, since they exit after a few minutes.
Consider the following process tree on the execute machine:
----------------------------------------------------------
condor   29323  0.5  0.0  90240  8972 ?        Ss   14:52   0:00      \_ condor_starter -f -a slot1_2 submitd.example.com
freyermu 29345  0.0  0.0  20000   828 ?        SNs  14:52   0:00          \_ /usr/libexec/singularity/bin/action-suid /bin/sleep 180
freyermu 29354  0.0  0.0  27288   856 ?        SN   14:52   0:00          |   \_ shim-init                                /bin/sleep 180
freyermu 29355  0.0  0.0   4116    72 ?        SN   14:52   0:00          |       \_ /bin/sleep 180
freyermu 29369  1.4  0.0 125228  4680 ?        SNs  14:52   0:00          \_ sshd: freyermu [priv]
freyermu 29371  0.0  0.0 125228  1804 ?        SN   14:52   0:00          |   \_ sshd: freyermu@pts/4
freyermu 29372  0.0  0.0  55968  4488 pts/4    SNs+ 14:52   0:00          |       \_ /usr/bin/condor_docker_enter
root     29373  0.0  0.0 116824   800 ?        S    14:52   0:00          \_ /usr/bin/nsenter.real -m -i -p -t 29354 /usr/sbin/chroot --userspec 67803 /proc/29354/root
freyermu 29377  0.0  0.0 108236  1504 ?        S    14:52   0:00              \_ /bin/bash -i
----------------------------------------------------------
As you can see, as soon as the "sleep" exits, singularity will close and sshd + nsenter will be killed.

I don't have any good ideas for the two remaining issues yet (probably there is no way around changing the implementation),
anything is welcome.

Cheers and many thanks!
	Oliver


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature