[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Limiting HTCondor total RAM usage



> On Feb 17, 2015, at 6:47 AM, Brian Candler <b.candler@xxxxxxxxx> wrote:
> 
> Does anyone have experience in hard-limiting condor's *total* RAM usage, e.g. by putting the condor_start process inside a cgroup?
> 
> I have some machines which need to share with some critical background tasks, and I need to avoid them being hit by the OOM killer (which can happen if I throw a load of jobs into the queue, and those jobs have underestimated their RAM usage)
> 
> I found this:
> https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToLimitMemoryUsage
> 
> But if I understand it rightly, this is for enforcing a hard memory usage on individual jobs. This might work, but (a) it involves all the RequestMemory values being individually correct, and (b) I have a lot of jobs which share memory, e.g. they mmap() the same file, and I'm not sure how a hard RequestMemory limit would interact with that.


Hi Brian,

A few comments:

1) cgroups enforce a "first to touch" model.  The mmap memory usage is charged to the first job to pull a piece of data into the page cache (which may or may not be the first job to run).  When the job exits, the memory usage is migrated to the parent cgroup.
  - Note that the page cache (unless you mlock stuff) is treated as evictable - this will get swapped out if applications need to malloc() something.

2) If you don't trust your users to put in reasonable hard limits, you may want to look at the soft limit model: this only enforces memory limits kernel-side when there is memory pressure on the host (a bit before OOM fires).

The upside of using these approaches are that the OOM killer will delegate its work to HTCondor and HTCondor will put the job on hold (in most cases - can't say it'll happen 100% of the time.  Many corner cases when systems are running out of memory).

> 
> So I'd rather for now limit the total HTCondor usage only. Putting HTCondor inside a VM is one option (messy); deploying a HTCondor Docker container is another option (more stuff to learn); so I wonder if using a cgroup directly might be the way to go.
> 

Underneath, Docker depends on cgroups - it's going to be the same problem.  You are going to be best served by seeing what your init system can enforce - or trying something like cgrulesd.

Per your other message about Debian Wheezy not supporting this well in the first place - unfortunately, this might mean the first step is to upgrade.



Ah-ha!  Reading your message again, you don't really care about memory usage, you care about making sure the OOM killer doesn't hit important stuff.  Newer versions of HTCondor automatically adjust the job's OOM score so it's much more likely to pick these.  Look at oom_adj (newer kernels: oom_score_adj); you can either make critical tasks less-likely to get picked by the OOM or make HTCondor more-likely.  This is an older kernel feature and is likely to be better-supported.

Brian