[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Using cgroups to limit job memory



Todd

Thanks for your reply. See embedded replies...

Roderick

On 01/04/15 18:24, Todd Tannenbaum wrote:
On 4/1/2015 9:20 AM, Roderick Johnstone wrote:
Hi

I'm using HTCondor 8.2.7 on Redhat 6.6 and have set up cgroups as per
the manual so that jobs with many processes cannot take too much memory.
I have CGROUP_MEMORY_LIMIT_POLICY = hard

When I specify eg request_memory=100M in the job submit file the job is
indeed limited to 100M of resident memory.

While this behaviour is good for the machine owner, its less than ideal
for the job owner since the job may continue but only very slowly since
its paging a lot. This condition might not be obvious to the job owner.

Although this seems to be the behaviour documented in the manual, I'm
sure I have seen a description of a configuration in which the job can
be placed on hold with a suitable message if it tries to allocate more
memory than it requests, although I can't find that now.


The HOWTO recipes are your friend.  From the HTCondor.org homepage look
for "HOWTO recipes"; the direct link is
   https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToAdminRecipes

Thanks for the pointer, yes these are really useful.

Specifically I think you'll find this one of interest
   https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToLimitMemoryUsage
as it gives examples on how to preempt and/or place jobs on hold that
use too much memory.

ok, I think I understand this, but it would be good to have clarification on a one thing please:

1) Is MemoryUsage tracking the resident memory usage (ie excluding any virtual memory) of the whole job process tree when cgroups is configured?

If so, would something like the following, (based on examples from the wiki page), in an environment with cgroups enabled, place a job on hold when the job process tree allocates more resident memory than in the request_memory submit file attribute?

# Allow jobs to not be limited by request_memory otherwise
# this policy can never be triggered
CGROUP_MEMORY_LIMIT_POLICY=none

# hold jobs that are more than 10% over requested memory
MEMORY_EXCEEDED = ((MemoryUsage*1.1 > request_memory) =!= TRUE)
PREEMPT = $(PREEMPT)) || $(MEMORY_EXCEEDED)
WANT_SUSPEND = False
WANT_HOLD = $(MEMORY_EXCEEDED)
WANT_HOLD_REASON = ifThenElse( $(MEMORY_EXCEEDED), \
      "Your job used more resident memory than it requested.", \
      undefined )



Also likely of interest is
   https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToLimitCpuUsage

Thanks, thats my next project!


Hope the above helps.  Also interested in any thoughts you may have to
improve the above HOWTOs.

Thanks again. See above for (minimal) feedback.

Roderick

regards,
Todd