[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] CGROUPS + OOM / HOLD on exit
- Date: Wed, 31 Jul 2013 12:02:04 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] CGROUPS + OOM / HOLD on exit
On 7/30/2013 2:51 PM, Todd Tannenbaum wrote:
On 7/30/2013 1:01 PM, Brian Bockelman wrote:
I think I have figured out why you're hitting this and not us locally.
The cgroup API generates a notification whenever the OOM occurs *or*
the cgroup is removed. Locally, we pre-create all the possible cgroups
(older RHEL kernels would crash if the cgroups were not pre-created;
since fixed) which causes the cgroup to not be deleted by condor.
Hence, we never got the notification when the cgroup was removed and
this issue was missed in testing.
I created a ticket for this at
Once we receive a CLA from Joan we will incorporate the patch posted
here into the codebase.
CLA received, patch pushed into the code and will be released starting
with HTCondor v8.0.2.
Joan, thank you so much for your debugging help and the patch to
HTCondor! Folks like yourself are making the software better and better.