Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Limiting HTCondor total RAM usage
- Date: Tue, 24 Feb 2015 12:54:50 -0600
- From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Limiting HTCondor total RAM usage
> On Feb 24, 2015, at 11:30 AM, Brian Candler <b.candler@xxxxxxxxx> wrote:
>
> On 24/02/2015 16:07, Greg Thain wrote:
>>> Unfortunately, I spoke to soon. What happened then is that oom_killer started killing processes when I don't think it should.
>>>
>>> I can cat memory.usage_in_bytes every second and I see it slowly creeping up to 1.6G or 1.7G, and then suddenly the process dies, and dmesg shows a splurge of backtrace of oom_killer output.
>>>
>>
>> Can you share the specific dmesg output about the oom killer?
> Sure. One is below. Note: these jobs do a lot of writing (including mainly output to NFS), and there might be a lot of dirty pages in the cache to be flushed, which is what the backtrace suggests to me. Do dirty VFS pages count towards a cgroup's memory utilisation?
>
> However what I don't understand is why I get oom killers with this config:
>
> BASE_CGROUP = htcondor
> CGROUP_MEMORY_LIMIT_POLICY = none
>
> but get no oom killers when those two lines are removed. The only difference then is whether condor creates cgroups for its slots, but they should be non-enforcing cgroups.
>
Hi Brian,
This rings a bell. Kernels of that vintage will mark cache pages as un-evictable if they are dirty, but don't consider them as a mechanism for reducing memory pressure if a cgroup needs more memory. Too many dirty pages within a job (i.e., jobs writing very quickly) will mean all memory becomes un-evictable and lead to the OOM.
IIRC, UW was seeing cases where a write-heavy process could generate enough dirty pages that it was almost always killed. This is because the dirty page limit is set to some % of total memory (on a modern worker node, this can be multiple GB of RAM), as opposed to a fixed limit.
In the 3.x series, the Linux kernel fixes this bug - and changes the defaults from % of total memory to number of bytes.
Try looking at:
vm.dirty_bytes (instead of vm.dirty_ratio)
vm.dirty_background_bytes (instead of vm.dirty_background_ratio)
I'm not familiar with the Debian defaults.
Hopefully this is sending you in the right direction!
Brian