[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Out of memory killer & cgroups



On 10/02/2014 09:19 AM, Rich Pieri wrote:
The Linux kernel OOM killer is independent of cgroups. The only interaction with cgroups is that a cgroup memory limit may cause the OOM killer to activate sooner than it would without a constrained memory limit. SIGKILL (kill -9) is immediate and it cannot be trapped or ignored. The killed process does not have a chance to write out any logs or otherwise clean up after itself so there's nothing that it can do to let users know why it was killed. What the parent does is up to the parent, although right off it has no way to distinguish between a KILL signal sent by the kernel and a KILL signal sent by a user. So, on the face of it, the behavior that you are seeing is something that I would expect to see. Whether or not it's the intended behavior is something that I will leave to the Condor devs to address.

Note this isn't entirely correct. A process can register to have the cgroup memory controller notify it when the per-cgroup OOM fires. This is what the condor_starter does, so that it can differentiate between the OOM killer firing, and some other reason the job was kill'ed -9.

-Greg