[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Using cgroups to limit job memory



Brian

Maybe CGROUP_MEMORY_LIMIT_POLICY = soft can be of use. I have a number of queries from Todd Tannenbaum's reply to my query that I'm going to ask about.

We are not going to run the development release series, but its useful to see whats coming down the line.

Thanks for your response.

Roderick

On 01/04/15 18:27, Brian Bockelman wrote:
Hi Roderick,

Maybe it would be reasonable to set:

CGROUP_MEMORY_LIMIT_POLICY = soft

?  This allows the job to dip into free RAM (avoiding swap); when the machine runs out of memory, then the jobs over their allocation are in danger of being killed.

In the 8.2.x series with the "hard" policy, jobs will indeed obey RAM limits - but can use up to 100% of the total machine swap*.  That might be more swap than intended.

Starting in 8.3.1, you can set PROPORTIONAL_SWAP_ASSIGNMENT=true on the startd**.  This will default the per-job swap limit to the same percentage as the total memory limit.

So, if the job requests 2GB RAM and the machine has 16 GB RAM / 8 GB swap, then the job will receive 2GB RAM and 1GB swap.

Finally, if you want jobs to *never* access swap, starting in 8.3.1, I think you can add the following to the schedd file:

SUBMIT_EXPRS = $(SUBMIT_EXPRS) VirtualMemory
VirtualMemory = RequestMemory

(recipe untested but looks correct).  I suppose you can also set small amount of swap:

VirtualMemory = RequestMemory + 10

(values are in MB).  Once you utilize RAM past your limits, the job should go on hold.

Hope this is helpful,

Brian

* locally, we've found swap to be useless for worker nodes and simply remove all swap devices.

** I can't find documentation for this

On Apr 1, 2015, at 9:20 AM, Roderick Johnstone <rmj@xxxxxxxxxxxxx> wrote:

Hi

I'm using HTCondor 8.2.7 on Redhat 6.6 and have set up cgroups as per the manual so that jobs with many processes cannot take too much memory. I have CGROUP_MEMORY_LIMIT_POLICY = hard

When I specify eg request_memory=100M in the job submit file the job is indeed limited to 100M of resident memory.

While this behaviour is good for the machine owner, its less than ideal for the job owner since the job may continue but only very slowly since its paging a lot. This condition might not be obvious to the job owner.

Although this seems to be the behaviour documented in the manual, I'm sure I have seen a description of a configuration in which the job can be placed on hold with a suitable message if it tries to allocate more memory than it requests, although I can't find that now.

So, is it possible to configure what happens when the job exceeds the requested memory at all?

Thanks

Roderick Johnstone