[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor/cgroups: limiting CPUs/pinning processes to CPUs with hyperthreaded CPUs

> On Feb 4, 2016, at 10:21 AM, Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote:
> Hi Michael,
> yes - I had some hands on with cgroups on Univa SGE and they really got
> useful.
> Actually, my original question arose, when I noticed a user complaining
> in another mailing list, that his jobs got killed at another site -
> rightly, I guess, since he was running a 'make -j32' while requesting
> one core...
> I really am looking forward to let cgrougs take care of such human forms
> as your Matlab cases and do not have to worry much about
> thread/memory/... bombs anymore ;)

Hi Thomas,

Actually, there should be no problem in running “make -j32” when running on one core — other than it’ll be slightly slower than 1/32 of what was expected!

It probably got killed for memory usage.  The kernel in RHEL6.x has a poor interaction between dirty pages in the page cache and cgroups (basically when cgroup runs out of memory it doesn’t consider dirty pages as recoverable).  If you’re seeing heavy-IO jobs getting killed due to memory, try setting the dirty page limit to a smaller value (such as 100MB) versus a percent of system size (which is often larger than the cgroup memory allocation).

Alternately, consider using “soft” enforcement to give the users a bit more breathing room before preemption.