[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] CGROUPS + OOM / HOLD on exit



On 7/30/2013 2:51 PM, Todd Tannenbaum wrote:
On 7/30/2013 1:01 PM, Brian Bockelman wrote:
Hi Joan,

I think I have figured out why you're hitting this and not us locally.
  The cgroup API generates a notification whenever the OOM occurs *or*
the cgroup is removed.  Locally, we pre-create all the possible cgroups
(older RHEL kernels would crash if the cgroups were not pre-created;
since fixed) which causes the cgroup to not be deleted by condor.
  Hence, we never got the notification when the cgroup was removed and
this issue was missed in testing.
[snip]

I created a ticket for this at
   https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3824

Once we receive a CLA from Joan we will incorporate the patch posted
here into the codebase.


CLA received, patch pushed into the code and will be released starting with HTCondor v8.0.2.

Joan, thank you so much for your debugging help and the patch to HTCondor! Folks like yourself are making the software better and better.

-Todd