[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] nodes without cgrouped jobs?

Hi again,

I think I managed manually to replicate the disappearance of the jobs'
cgroups at least partially. It looks like to be due to an issue with
systemd/a typo... [*]

  ~~> Condor is innocent

Cheers and sorry for the noise!

ps: unfortunately, I do not completely understand the behaviour and
would appreciate any ideas from systemd experts ;) [**]

- we are distributing a systemd unit via puppet, which starts a
Singularity container/runscript (that binds the root path internally)
  ExecStart=/usr/bin/singularity run --bind /:/rootfs:ro
- when distributed/updated on(to) a node, puppet would trigger a
  systemctl daemon-reload
- and would ensure the service to be active
- due to a bug (~>forgotten variable), the unit's template might contain
a condition dangling in the air, i.e.,
  ExecStart=/usr/bin/singularity run --bind /:/rootfs:ro
- when this (apparently defective) unit got started (and ensured by
puppet...), the existing job slices in the cpu and memory controllers
got wiped out!? (the condor.service parent slices survived the unit start)
- with a fixed unit, the job slices survive (re)starts of the service!

- what I do not fully understand is why/how the processes loose their
cgroup slices or why/how systemd/kernel does it?? The PIDs are
unaffected - so I would have naively assumed, that once assigned to a
cgroup, a process would stay there. But apparently the cgroups get
removed(?) and the PIDs appended to the next parent group(?)

- the slices get only wiped-out when the exec is started through
systemd. And I have not been able to reproduce the behaviour taking each
step manually.

- what kind of namespace view does systemd has? I see systemd processes
belonging to PPID=1 as well as PPID=0(!?)

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature