[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Limiting memory used on the worker node with c-groups

I do think your problem is as simple as Thomas' question: figuring out why oom_control is set to disabled. These cgroup settings are inherited hierarchically so it could be the htcondor group itself or a cgroup above it. It could even be set system-wide. The defined behavior is:

When the OOM killer is disabled, tasks that attempt to use more memory than they are allowed are paused until additional memory is freed.

As for why I don't like hard enforcement of cgroups memory limits. Miron has previously described HTCondor as not being the "CPU police." Well, I don't think it should be the memory police either. My old job as an HTCondor administrator was to protect the availability of the *HTCondor service* for all users from a job crashing the system.

In real-world situations, most jobs can sneak above their memory limit and it's not a big deal because other jobs are below their limit. Why make it a big deal?

In fact, this is how HTCondor uses the kernel to enforce CPU resources. Jobs can go above their CPU limits so long as there are cycles available. If cycles aren't available, the limit is imposed strictly.

Beyond the philosophy of the matter, some code doesn't have a memory requirement that can be predicted in advance or might be sensitive to specific parameter choices. i.e. a job's memory limit might follow a statistical pattern in which 95% of jobs are below 4GB, 99% below 10GB, etc.

Do I actually want to encourage the user to set all jobs to 10GB limit? Or to ask them to spend their time creating a rang of submit files with different memory values?


On Mon, Apr 27, 2020 at 1:58 AM <jean-michel.barbet@xxxxxxxxxxxxxxxxx> wrote:
On 4/24/20 5:16 PM, tpdownes@xxxxxxxxx wrote:
> JM-
> When somethingÂin the universe goes wrong with HTCondor and CGroups, I
> feel a little twitch. When you say the processes are in the "deferred"
> state, do you mean they are in the "D" state according to ps? Or do you
> mean the actual literal "job deferral" options in "htcondor"?

Hello Tom,

Thank you very much. You are right, I misused the term "deferred",
I was talking about "D" state.

> https://support.microfocus.com/kb/doc.php?id=7002725
> A common reason for a job getting stuck in D is a bad / overloaded
> remote filesystem (NFS, etc.). Is that a possibility here?

Using the command mentioned in the article you mention, I see lines such
as :

ps -eo ppid,pid,user,stat,pcpu,comm,wchan:32 | grep sgmali
30138 30333 sgmali0+ D  87.4 aliroot   Âmem_cgroup_oom_synchronize
30341 30435 sgmali0+ D  Â0.3 perl     mem_cgroup_oom_synchronize
12455 30605 sgmali0+ D  Â0.0 perl     mem_cgroup_oom_synchronize
12594 30869 sgmali0+ D  Â0.0 perl     mem_cgroup_oom_synchronize

> FYI: even if you didn't understand my presentation, you made the type of
> choice I recommend. Use "soft" but lie a bit about how much RAM you
> have. It allows more jobs to match while still ensuring that CGroups can
> do its job.
It is always more difficult to fully understand slides if you do not
hear the presenter :-) I hope there is no perceived offense here.


a) these processes in "D" state started to appear after I activated the
  "soft" mode on workers, so I think there is a link.

b) I do not exclude the possibility that the jobs themselves are
  reacting badly to a signal. These are production jobs of the
  LHC ALICE VO and I am only running this VO (no comparison).

c) meanwhile I modified one worker to use the "hard" mode and seems to
  behave OK, I did not find removed jobs on this worker in the last
  24h or so. This is one point I did not understand : what is the
  potential issue with the "hard" mode ?

Thank you.


Jean-michel BARBETÂ Â Â Â Â Â Â Â Â Â | Tel: +33 (0)2 51 85 84 86
Laboratoire SUBATECH Nantes France  | Fax: +33 (0)2 51 85 84 79
CNRS-IN2P3/Ecole des Mines/Universite | E-Mail: barbet@xxxxxxxxxxxxxxxxx
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: