[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Memory accounting issue with cgroups




On 5/20/23 5:03 AM, Marco van Zwetselaar wrote:


I guess my mental picture of memory.high as a yellow card, and memory.max as the red card was incorrect. It's more like rugby: the referee's stare is enough. :-)


Hi Marco:

I'm glad it is working for you now. We don't have a lot of experience with the policy settings for cgroup v2, and would be eager to hear experiences or advice on what they should be set to. The kernel docs are a little vague about the difference between "high" and "max", saying that usually a cgroup gets OOM killed when it hits "high", but in some cases can go all the way up to "max" before the OOM arrives. It isn't clear to me if this means maybe a page or two more memory, in order to deliver the signal, or potentially some unbounded amount of memory. Given that, I chose to have condor only set "max".

If you will excuse me stretching your metaphor, "high" is the moment the red card goes into the air, but "max" is when the guilty party actually leaves the pitch. "memory.min" is like our youth leagues here, where there is an unwritten understanding that if one team can't field some minimum number of players (seven?), the opposing team (if able) will loan them some players in order that the kids can still get a game in (despite a forfeit on the books). And I have no good idea right now what htcondor should set "memory.low" to.



On a side note to the Condor devs: my config has 'DISABLE_SWAP_FOR_JOB = true'. Shouldn't that translate to 'memory.swap.max = 0' on the cgroup (currently shows "max")?


The cgroup v2 code path doesn't set this. I'll write a PR to fix this.


Thanks,

-greg