Do you have memory.use_hierarchy enabled? The command to see
would be something like:
Some kernels support this others don't and those that don't
caused Condor 8.2.9 would silently fail to register its OOM
killer (even though there are some notes in the release saying
use_hierarchy should work) and the cgroups
configuration/kernel wouldn't be working together well and
ultimately manifests as a subset of jobs failing to be killed
and being left in the D state.
It's part of 3.16 but RHEL6 may have backported? It looks
to me like RHEL7 backported to 3.10 (but don't quote me as I
I also see that your soft limit, hard limit and RAM+swap
limit are all set but similar. The soft RAM limit appears to
be 4096MB on the dot while the hard memory and swap limits
appear to be below 4000MB (and different by a small amount).
Are those the values you expect for your configuration?