[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Limiting HTCondor total RAM usage



> On Feb 24, 2015, at 11:30 AM, Brian Candler <b.candler@xxxxxxxxx> wrote:
> 
> On 24/02/2015 16:07, Greg Thain wrote:
>>> Unfortunately, I spoke to soon. What happened then is that oom_killer started killing processes when I don't think it should.
>>> 
>>> I can cat memory.usage_in_bytes every second and I see it slowly creeping up to 1.6G or 1.7G, and then suddenly the process dies, and dmesg shows a splurge of backtrace of oom_killer output.
>>> 
>> 
>> Can you share the specific dmesg output about the oom killer?
> Sure. One is below. Note: these jobs do a lot of writing (including mainly output to NFS), and there might be a lot of dirty pages in the cache to be flushed, which is what the backtrace suggests to me. Do dirty VFS pages count towards a cgroup's memory utilisation?
> 
> However what I don't understand is why I get oom killers with this config:
> 
> BASE_CGROUP = htcondor
> CGROUP_MEMORY_LIMIT_POLICY = none
> 
> but get no oom killers when those two lines are removed. The only difference then is whether condor creates cgroups for its slots, but they should be non-enforcing cgroups.
> 

Hi Brian,

This rings a bell.  Kernels of that vintage will mark cache pages as un-evictable if they are dirty, but don't consider them as a mechanism for reducing memory pressure if a cgroup needs more memory.  Too many dirty pages within a job (i.e., jobs writing very quickly) will mean all memory becomes un-evictable and lead to the OOM.

IIRC, UW was seeing cases where a write-heavy process could generate enough dirty pages that it was almost always killed.  This is because the dirty page limit is set to some % of total memory (on a modern worker node, this can be multiple GB of RAM), as opposed to a fixed limit.

In the 3.x series, the Linux kernel fixes this bug - and changes the defaults from % of total memory to number of bytes.

Try looking at:

vm.dirty_bytes (instead of vm.dirty_ratio)
vm.dirty_background_bytes (instead of vm.dirty_background_ratio)

I'm not familiar with the Debian defaults.

Hopefully this is sending you in the right direction!

Brian