[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] CGROUPS + OOM / HOLD on exit



Thank you for your supportive words, I'm glad to have been able to contribute something back to the community. I'll keep doing my best to improve HTCondor!

Joan

On 31/07/13 19:02, Todd Tannenbaum wrote:
On 7/30/2013 2:51 PM, Todd Tannenbaum wrote:
On 7/30/2013 1:01 PM, Brian Bockelman wrote:
Hi Joan,

I think I have figured out why you're hitting this and not us locally.
  The cgroup API generates a notification whenever the OOM occurs *or*
the cgroup is removed.  Locally, we pre-create all the possible cgroups
(older RHEL kernels would crash if the cgroups were not pre-created;
since fixed) which causes the cgroup to not be deleted by condor.
  Hence, we never got the notification when the cgroup was removed and
this issue was missed in testing.
[snip]

I created a ticket for this at
   https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3824

Once we receive a CLA from Joan we will incorporate the patch posted
here into the codebase.


CLA received, patch pushed into the code and will be released starting with HTCondor v8.0.2.

Joan, thank you so much for your debugging help and the patch to HTCondor! Folks like yourself are making the software better and better.

-Todd
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/