Re: [HTCondor-users] Limiting memory used on the worker node with c-groups

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

I do think your problem is as simple as Thomas' question: figuring out why oom_control is set to disabled. These cgroup settings are inherited hierarchically so it could be the htcondor group itself or a cgroup above it. It could even be set system-wide. The defined behavior is:

When the OOM killer is disabled, tasks that attempt to use more memory than they are allowed are paused until additional memory is freed.

As for why I don't like hard enforcement of cgroups memory limits. Miron has previously described HTCondor as not being the "CPU police." Well, I don't think it should be the memory police either. My old job as an HTCondor administrator was to protect the availability of the *HTCondor service* for all users from a job crashing the system.

In real-world situations, most jobs can sneak above their memory limit and it's not a big deal because other jobs are below their limit. Why make it a big deal?

In fact, this is how HTCondor uses the kernel to enforce CPU resources. Jobs can go above their CPU limits so long as there are cycles available. If cycles aren't available, the limit is imposed strictly.

Beyond the philosophy of the matter, some code doesn't have a memory requirement that can be predicted in advance or might be sensitive to specific parameter choices. i.e. a job's memory limit might follow a statistical pattern in which 95% of jobs are below 4GB, 99% below 10GB, etc.

Do I actually want to encourage the user to set all jobs to 10GB limit? Or to ask them to spend their time creating a rang of submit files with different memory values?

Tom

On Mon, Apr 27, 2020 at 1:58 AM <jean-michel.barbet@xxxxxxxxxxxxxxxxx> wrote:

On 4/24/20 5:16 PM, tpdownes@xxxxxxxxx wrote:
> JM-
>
> When somethingÂin the universe goes wrong with HTCondor and CGroups, I
> feel a little twitch. When you say the processes are in the "deferred"
> state, do you mean they are in the "D" state according to ps? Or do you
> mean the actual literal "job deferral" options in "htcondor"?

Hello Tom,

Thank you very much. You are right, I misused the term "deferred",
I was talking about "D" state.

> https://support.microfocus.com/kb/doc.php?id=7002725
>
> A common reason for a job getting stuck in D is a bad / overloaded
> remote filesystem (NFS, etc.). Is that a possibility here?

Using the command mentioned in the article you mention, I see lines such
as :

ps -eo ppid,pid,user,stat,pcpu,comm,wchan:32 | grep sgmali
[...]
30138 30333 sgmali0+ DÂ Â 87.4 alirootÂ Â Â Âmem_cgroup_oom_synchronize
30341 30435 sgmali0+ DÂ Â Â0.3 perlÂ Â Â Â Â mem_cgroup_oom_synchronize
12455 30605 sgmali0+ DÂ Â Â0.0 perlÂ Â Â Â Â mem_cgroup_oom_synchronize
12594 30869 sgmali0+ DÂ Â Â0.0 perlÂ Â Â Â Â mem_cgroup_oom_synchronize

> FYI: even if you didn't understand my presentation, you made the type of
> choice I recommend. Use "soft" but lie a bit about how much RAM you
> have. It allows more jobs to match while still ensuring that CGroups can
> do its job.
It is always more difficult to fully understand slides if you do not
hear the presenter :-) I hope there is no perceived offense here.

Anyway,

a) these processes in "D" state started to appear after I activated the
Â Â "soft" mode on workers, so I think there is a link.

b) I do not exclude the possibility that the jobs themselves are
Â Â reacting badly to a signal. These are production jobs of the
Â Â LHC ALICE VO and I am only running this VO (no comparison).

c) meanwhile I modified one worker to use the "hard" mode and seems to
Â Â behave OK, I did not find removed jobs on this worker in the last
Â Â 24h or so. This is one point I did not understand : what is the
Â Â potential issue with the "hard" mode ?

Thank you.

JM

--
------------------------------------------------------------------------
Jean-michel BARBETÂ Â Â Â Â Â Â Â Â Â | Tel: +33 (0)2 51 85 84 86
Laboratoire SUBATECH Nantes FranceÂ Â | Fax: +33 (0)2 51 85 84 79
CNRS-IN2P3/Ecole des Mines/Universite | E-Mail: barbet@xxxxxxxxxxxxxxxxx
------------------------------------------------------------------------
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Mailing List Archives

Public Access

Re: [HTCondor-users] Limiting memory used on the worker node with c-groups