Dear condor list,
On Thu, Aug 1, 2013 at 3:37 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx>
On Jul 23, 2013, at 7:58 AM, Chris Filo Gorgolewski <
On Thu, Jun 27, 2013 at 2:31 AM, Jason Ferrara <
I have a pool of machines running CentOS 6.4, Kernel 2.6.32-358, and
Today, in order to try to stop jobs which underestimate their memory
usage from making the machines swap a lot and get slow, I enabled cgroups
CGROUP_MEMORY_LIMIT_POLICY = soft
RESERVED_MEMORY = 1024
I have similar issue (jobs which underestimate their memory usage), but
I wasn't sure that CGROUP will solve this. Do I understand correctly that
setting CGROUP_MEMORY_LIMIT_POLICY to either "soft" or "hard" will disable
swapping for all condor jobs?
- "hard" will kill the job as soon as it goes over its requested memory.
- "soft" will kill jobs that are over their requested memory only when
the kernel believes memory is tight.
This is not a correct description.
First of all, it's worth noting that both hard and soft limits should
not kill any process unless there is other good reason for OOM killer
to intervene - such situation may be a running out of memory
completely, but definitely not just going over soft or hard memory
limit. And even when OOM killer is triggered, you can specify what it
will mean for your process in particular cgroup via memory.oom_control
file of memory cgroup controller.
Unfortunately there is an exception from this: a known bug in
RHEL/CentOS kernel will kill processes of cgroup with small hard
memory limit when the limit is breached. But this *is not* an correct
behaviour. Kernel developers are working on the fix, see the bugzilla
for details: https://bugzilla.redhat.com/show_bug.cgi?id=870011
Back to the CGROUP_MEMORY_LIMIT_POLICY, see what upstream condor
<from condor docs>
If the hard limit is in force, then the total amount of physical
memory used by the sum of all processes in this job will not be
allowed to exceed the limit. If the processes try to allocate more
memory, the allocation will succeed, and virtual memory will be
allocated, but no additional physical memory will be allocated. The
system will keep the amount of physical memory constant by swapping
some page from that job out of memory.
if the soft limit is in place, the job will be allowed to go over the
limit if there is free memory available on the system. Only when there
is contention between other processes for physical memory will the
system force physical memory into swap and push the physical memory
used towards the assigned limit.
</from condor docs>
Note that this description is consistent with kernel documentation for
cgroups which was referenced in this thread before.
Furthermore, the free memory used in the "soft" policy is calculated
based on the current system state not taken from the RESERVED_MEMORY
No -- soft-memory kills are controlled by the kernel. From
7. Soft limits
Soft limits allow for greater sharing of memory. The idea behind soft
is to allow control groups to use as much of the memory as needed,
a. There is no memory contention
b. They do not exceed their hard limit
When the system detects memory contention or low memory, control groups
are pushed back to their soft limits. If the soft limit of each control
group is very high, they are pushed back as much as possible to make
sure that one control group does not starve the others of memory.
Please note that soft limits is a best-effort feature; it comes with
no guarantees, but it does its best to make sure that when memory is
heavily contended for, memory is allocated based on the soft limit
hints/setup. Currently soft limit based reclaim is set up such that
it gets invoked from balance_pgdat (kswapd).
I'm not a kernel programmer, but looking at the relevant kernel code, it
seems that the cgroup is checked prior to swapping in most cases.
The archives can be found at:
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with
You can also unsubscribe by visiting
The archives can be found at: