Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Limiting memory used on the worker node with c-groups

Date: Thu, 30 Apr 2020 08:15:29 -0500
From: Gregory Thain <gthain@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Limiting memory used on the worker node with c-groups


On 4/30/20 4:27 AM, jean-michel.barbet@xxxxxxxxxxxxxxxxx wrote:

On 4/30/20 6:19 AM, tpdownes@xxxxxxxxx wrote:
I do think your problem is as simple as Thomas' question: figuringout why oom_control is set to disabled. These cgroup settings areinherited hierarchically so it could be the htcondor group itself ora cgroup above it. It could even be set system-wide.

HTCondor intentionally sets oom_kill_disable because the starter reallyneeds to know if the job was OOM killed, and treat the job differentlythan if it just got a normal signal 9. We think it is very unfortunatethat the OOM killer kills with the usual signal 9, and not a customsignal just for OOM -- we wouldn't need to do this if the OOM signal wasits own value. The starter also installs a handler to get notified whenthe kernel oom-kills a process in the job.Â This lets the starter cleanup the job, and put the job on hold with an appropriate message if itgets OOM killed.Â If we didn't do this, the an OOM killed job would bekilled with signal 9, and probably leave the queue, as from condor'sperspective, it has exitted of its own accord.



-greg

Follow-Ups:
- Re: [HTCondor-users] Limiting memory used on the worker node with c-groups
  - From: tpdownes

References:
- [HTCondor-users] Limiting memory used on the worker node with c-groups
  - From: Jean-Michel Barbet
- Re: [HTCondor-users] Limiting memory used on the worker node with c-groups
  - From: tpdownes
- Re: [HTCondor-users] Limiting memory used on the worker node with c-groups
  - From: jean-michel . barbet
- Re: [HTCondor-users] Limiting memory used on the worker node with c-groups
  - From: tpdownes
- Re: [HTCondor-users] Limiting memory used on the worker node with c-groups
  - From: jean-michel . barbet

Prev by Date: Re: [HTCondor-users] Limiting memory used on the worker node with c-groups
Next by Date: Re: [HTCondor-users] Limiting memory used on the worker node with c-groups
Previous by thread: Re: [HTCondor-users] Limiting memory used on the worker node with c-groups
Next by thread: Re: [HTCondor-users] Limiting memory used on the worker node with c-groups
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Limiting memory used on the worker node with c-groups