[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] spontaneous reboots after enabling cgroups
- Date: Fri, 2 Aug 2013 23:57:27 +0200
- From: Martin Bukatovic <martin.bukatovic@xxxxxxxxx>
- Subject: Re: [HTCondor-users] spontaneous reboots after enabling cgroups
Dear condor list,
On Thu, Aug 1, 2013 at 3:37 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
> On Jul 23, 2013, at 7:58 AM, Chris Filo Gorgolewski <krzysztof.gorgolewski@xxxxxxxxx> wrote:
>> On Thu, Jun 27, 2013 at 2:31 AM, Jason Ferrara <jason.ferrara@xxxxxxxxxxxxx> wrote:
>> I have a pool of machines running CentOS 6.4, Kernel 2.6.32-358, and HTCondor 7.9.4.
>> Today, in order to try to stop jobs which underestimate their memory usage from making the machines swap a lot and get slow, I enabled cgroups and set
>> CGROUP_MEMORY_LIMIT_POLICY = soft
>> RESERVED_MEMORY = 1024
>> I have similar issue (jobs which underestimate their memory usage), but I wasn't sure that CGROUP will solve this. Do I understand correctly that setting CGROUP_MEMORY_LIMIT_POLICY to either "soft" or "hard" will disable swapping for all condor jobs?
> - "hard" will kill the job as soon as it goes over its requested memory.
> - "soft" will kill jobs that are over their requested memory only when the kernel believes memory is tight.
This is not a correct description.
First of all, it's worth noting that both hard and soft limits should
not kill any process unless there is other good reason for OOM killer
to intervene - such situation may be a running out of memory
completely, but definitely not just going over soft or hard memory
limit. And even when OOM killer is triggered, you can specify what it
will mean for your process in particular cgroup via memory.oom_control
file of memory cgroup controller.
Unfortunately there is an exception from this: a known bug in
RHEL/CentOS kernel will kill processes of cgroup with small hard
memory limit when the limit is breached. But this *is not* an correct
behaviour. Kernel developers are working on the fix, see the bugzilla
for details: https://bugzilla.redhat.com/show_bug.cgi?id=870011
Back to the CGROUP_MEMORY_LIMIT_POLICY, see what upstream condor
<from condor docs>
If the hard limit is in force, then the total amount of physical
memory used by the sum of all processes in this job will not be
allowed to exceed the limit. If the processes try to allocate more
memory, the allocation will succeed, and virtual memory will be
allocated, but no additional physical memory will be allocated. The
system will keep the amount of physical memory constant by swapping
some page from that job out of memory.
if the soft limit is in place, the job will be allowed to go over the
limit if there is free memory available on the system. Only when there
is contention between other processes for physical memory will the
system force physical memory into swap and push the physical memory
used towards the assigned limit.
</from condor docs>
Note that this description is consistent with kernel documentation for
cgroups which was referenced in this thread before.
>> Furthermore, the free memory used in the "soft" policy is calculated based on the current system state not taken from the RESERVED_MEMORY variable?
> No -- soft-memory kills are controlled by the kernel. From https://www.kernel.org/doc/Documentation/cgroups/memory.txt:
> 7. Soft limits
> Soft limits allow for greater sharing of memory. The idea behind soft limits
> is to allow control groups to use as much of the memory as needed, provided
> a. There is no memory contention
> b. They do not exceed their hard limit
> When the system detects memory contention or low memory, control groups
> are pushed back to their soft limits. If the soft limit of each control
> group is very high, they are pushed back as much as possible to make
> sure that one control group does not starve the others of memory.
> Please note that soft limits is a best-effort feature; it comes with
> no guarantees, but it does its best to make sure that when memory is
> heavily contended for, memory is allocated based on the soft limit
> hints/setup. Currently soft limit based reclaim is set up such that
> it gets invoked from balance_pgdat (kswapd).
> I'm not a kernel programmer, but looking at the relevant kernel code, it seems that the cgroup is checked prior to swapping in most cases.
> The archives can be found at: