[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Bug: cgroup limits not enforced with Singularity containers and condor_ssh_to_job / interctive jobs



Dear HTCondor devs,

since I don't have access to the new (very well set-up!) JIRA bug tracking, let me use the mailing list to report an issue we observe with good old 8.8 through to the 9.0.16 series:

When starting a Singularity container job and attaching to it (or starting an interactive job), the process tree looks as follows:

  condor      4703  \_ condor_startd
  condor   3396708      \_ condor_starter -f -local-name slot_type_1 -a slot1_1 submitnode.physik.uni-bonn.de
  someuser 3396952          \_ Singularity runtime parent
  someuser 3396965          |   \_ sinit
  someuser 3396988          |       \_ /bin/sh -c sleep 180 && while test -d ${_CONDOR_SCRATCH_DIR}/.condor_ssh_to_job_1; do /bin/slee
  someuser 3396990          |           \_ sleep 180
  someuser 3396997          \_ sshd: someuser [priv]
  someuser 3396999          |   \_ sshd: someuser@pts/0
  someuser 3397000          |       \_ /usr/bin/condor_docker_enter
  someuser 3397020          \_ /usr/bin/nsenter -t 3396988 -S 67803 -G 513 -m -i -p -r -w
  someuser 3397021              \_ /bin/sh -l -i

However, the processes which "attached" later via nsenter do not end up in the same cgroup:

  # cat /sys/fs/cgroup/memory/htcondor/condor_pool_condor_slot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/cgroup.procs
  3396952
  3396965
  3396988
  3396990

Subsequently, limit enforcement (CPUs, Memory) does not take place, neither for interactive jobs nor for processes spawned after using "condor_ssh_to_job".

Ideas for good workarounds (or of course a fix) welcome ;-).

I'll sadly not make it to HTCondor Europe this year, since it collides with the start of our winter term (technical support for lectures and teaching duties),
but I wish all of you a good time in Italy â hope to see you in person in one of the next years again!

Cheers from Bonn,
	Oliver

--
Oliver Freyermuth
UniversitÃt Bonn
Physikalisches Institut, Raum 1.047
NuÃallee 12
53115 Bonn
--
Tel.: +49 228 73 2367
Fax:  +49 228 73 7869
--

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature