Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Memory accounting issue with cgroups

Date: Mon, 29 May 2023 18:17:35 +0000
From: Marco van Zwetselaar <zwets@xxxxxxxxxx>
Subject: Re: [HTCondor-users] Memory accounting issue with cgroups

Thanks Greg,

I've been experimenting a bit, using information from https://facebookmicrosites.github.io/cgroup2/docs/overview.html. Relevant quotes there:

memory.high is the memory usage throttle limit. This is the main mechanism to control a cgroupâs memory use. If a cgroup's memory use goes over the high boundary specified here, the cgroupâs processes are throttled and put under heavy reclaim pressure. The default is max, meaning there is no limit.
memory.max is the memory usage hard limit, acting as the final protection mechanism: If a cgroup's memory usage reaches this limit and can't be reduced, the system OOM killer is invoked on the cgroup.Â
Under certain circumstances, usage may go over the memory.high limit temporarily. When the high limit is used and monitored properly, memory.max serves mainly to provide the final safety net. The default is max.

The 10.6 "memory.max" setting worked for one job (keeping its RSS within request_mem), but for another type of job it almost immediately OOM-ed all instances. This was correct (in principle) from the condor viewpoint: the instances neatly went on Hold. (The only minor inconvenience being that the system sends an email for every OOM.)

However, when I manually set memory.high at 50% of memory.max, the offending jobs crept up to about 110% of that level and kept running. The memory.pressure (see doc here: https://facebookmicrosites.github.io/cgroup2/docs/pressure-metrics.html) then slowly went up to 98.5%, supposedly meaning that the job was spending 98.5% of its time stalling for memory pages to be swapped in.

This is on a machine with no disc swap, and plenty spare memory above memory.max (requested 256G out of 768G), so I suppose no actual swapping was taking place, just physical pages being marked "free" and "taken" again. (Or something like that, I'm no expert in Linux kernel memory management.)

Increasing memory.high to 90% of memory.max made consumption creep up again, levelling just below memory.max, with no OOMs. Reducing memory.high also worked, and consumption would go down again. Very neat.

Unfortunately, I didn't try lifting memory.high while the job was at memory.max, to see if memory.max would pressure the job before OOM-ing it (provided that it approached memory.max slowly). Its description (above) appears to suggest this: "if [memory consumption] reaches this limit **and can't be reduced** [then OOM ensues]".

I'm in the dark what "and can't be reduced" means. The OOM came almost immediately after starting the job, whereas with memory.high set at 90% of max, the job ran to completion.

Either way, it would seem that setting memory.high at ~90% of memory.max would be appropriate. I haven't yet thought about memory.min/low.

Cheers
Marco

On 20/05/2023 23:43, Greg Thain via HTCondor-users wrote:

On 5/20/23 5:03 AM, Marco van Zwetselaar wrote:

I guess my mental picture of memory.high as a yellow card, and memory.max as the red card was incorrect. It's more like rugby: the referee's stare is enough. :-)

The kernel docs are aÂ little vague about the difference between "high" and "max", saying that usually a cgroup gets OOM killed when it hits "high", but in some cases can go all the way up to "max" before the OOM arrives.Â It isn't clear to me if this means maybe a page or two more memory, in order to deliver the signal, or potentially some unbounded amount of memory.Â Given that, I chose to have condor only set "max".

If you will excuse me stretching your metaphor, "high" is the moment the red card goes into the air, but "max" is when the guilty party actually leaves the pitch.Â "memory.min" is like our youth leagues here, where there is an unwritten understanding that if one team can't field some minimum number of players (seven?), the opposing team (if able) will loan them some players in order that the kids can still get a game in (despite a forfeit on the books).Â And I have no good idea right now what htcondor should set "memory.low" to.

On a side note to the Condor devs: my config has 'DISABLE_SWAP_FOR_JOB = true'. Shouldn't that translate to 'memory.swap.max = 0' on the cgroup (currently shows "max")?

The cgroup v2 code path doesn't set this.Â I'll write a PR to fix this.

Thanks,

-greg

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Follow-Ups:
- Re: [HTCondor-users] Memory accounting issue with cgroups
  - From: Marco van Zwetselaar

References:
- [HTCondor-users] Memory accounting issue with cgroups
  - From: Marco van Zwetselaar
- Re: [HTCondor-users] Memory accounting issue with cgroups
  - From: Jan van Eldik
- Re: [HTCondor-users] Memory accounting issue with cgroups
  - From: Marco van Zwetselaar
- Re: [HTCondor-users] Memory accounting issue with cgroups
  - From: Greg Thain

Prev by Date: Re: [HTCondor-users] lower latency scheduling
Next by Date: Re: [HTCondor-users] Memory accounting issue with cgroups
Previous by thread: Re: [HTCondor-users] Memory accounting issue with cgroups
Next by thread: Re: [HTCondor-users] Memory accounting issue with cgroups
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Memory accounting issue with cgroups