[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Limiting memory used on the worker node with c-groups



On 4/30/20 6:19 AM, tpdownes@xxxxxxxxx wrote:
I do think your problem is as simple as Thomas' question: figuring out why oom_control is set to disabled. These cgroup settings are inherited hierarchically so it could be the htcondor group itself or a cgroup above it. It could even be set system-wide.

Hello Tom,

It appears that oom_control is enabled at the top level of the
"memory" c-group as well as in "htcondor below" but becomes disabled
at the slot level :

cat /sys/fs/cgroup/memory/htcondor/memory.oom_control
oom_kill_disable 0
under_oom 0

cat /sys/fs/cgroup/memory/htcondor/condor_dlocal_htcondor_slot1@xxxxxxxxxxxxxxxxx/memory.oom_control
oom_kill_disable 1
under_oom 0

I do not know why...

 The defined behavior is:

    When the OOM killer is disabled, tasks that attempt to use more
    memory than they are allowed are paused until additional memory is
    freed.

So the paused processes would correspond to those processes
in "D" state ? On machines with processes in D state, some
htcondor slots have under_oom set to 1 which seem consistent.

In real-world situations, most jobs can sneak above their memory limit and it's not a big deal because other jobs are below their limit. Why make it a big deal?

I started to look at this with the aim of preventing a whole worker
node to become "hanged" with memory exhaustion with the only way to
reboot it being to manually power-cycle it (which we cannot do in
the current lockdown, so we currently have worker nodes unavailable).

I also looked at SYSTEM_PERIODIC_REMOVE on the submitter but I learned
that it is slow to react while a pathological job could harm a worker
node before the job gets removed...

In fact I am not too unhappy with CGROUP_MEMORY_LIMIT_POLICY = hard
that I am testing on a single worker node but I may not have seen yet
all the drawbacks...

I have not yet decided what setting to choose. Having processes paused
is painful, the node does not recover by itself. Still it is better
than loosing it totally.

Thank you very much for your advice.

JM

--
------------------------------------------------------------------------
Jean-michel BARBET                    | Tel: +33 (0)2 51 85 84 86
Laboratoire SUBATECH Nantes France    | Fax: +33 (0)2 51 85 84 79
CNRS-IN2P3/Ecole des Mines/Universite | E-Mail: barbet@xxxxxxxxxxxxxxxxx
------------------------------------------------------------------------