[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] cgroups question/problem



Bob:

Do you have memory.use_hierarchy enabled? The command to see would be something like:

cat /sys/fs/cgroup/htcondor/memory.use_hierarchy

Some kernels support this others don't and those that don't caused Condor 8.2.9 would silently fail to register its OOM killer (even though there are some notes in the release saying use_hierarchy should work) and the cgroups configuration/kernel wouldn't be working together well and ultimately manifests as a subset of jobs failing to be killed and being left in the D state.

This is the kernel patch that matters:

http://lkml.iu.edu/hypermail/linux/kernel/1404.2/00715.html

It's part of 3.16 but RHEL6 may have backported? It looks to me like RHEL7 backported to 3.10 (but don't quote me as I run Debian).

I also see that your soft limit, hard limit and RAM+swap limit are all set but similar. The soft RAM limit appears to be 4096MB on the dot while the hard memory and swap limits appear to be below 4000MB (and different by a small amount). Are those the values you expect for your configuration?

Tom