[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Using cgroups to limit job memory



If so, would something like the following, (based on examples from the
wiki page), in an environment with cgroups enabled, place a job on hold
when the job process tree allocates more resident memory than in the
request_memory submit file attribute?

# Allow jobs to not be limited by request_memory otherwise
# this policy can never be triggered
CGROUP_MEMORY_LIMIT_POLICY=none

# hold jobs that are more than 10% over requested memory
MEMORY_EXCEEDED = ((MemoryUsage*1.1 > request_memory) =!= TRUE)
PREEMPT = $(PREEMPT)) || $(MEMORY_EXCEEDED)
WANT_SUSPEND = False
WANT_HOLD = $(MEMORY_EXCEEDED)
WANT_HOLD_REASON = ifThenElse( $(MEMORY_EXCEEDED), \
       "Your job used more resident memory than it requested.", \
       undefined )


Without actually testing the above, off the top of my head the idea
looks like it should work.  Note that the above has a syntax error for
the PREEMPT expression due to unmatched parenthesis - you probably wanted
   PREEMPT = ($(PREEMPT)) || $(MEMORY_EXCEEDED)
Also note that jobs will not be preempted until they exhaust their
MaxJobRetirementTime, which is time HTCondor promises to let the job run
without being preempted for any reason. So if you want to immediately
hold jobs that exceed memory usage even if the jobs have specified a
maxjobretirementtime and you are using HTCondor v8.2 or above, you will
want to use the template at
  https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToLimitCpuUsage
and just replace $(CPU_EXCEEDED) with $(MEMORY_EXCEEDED).

Nice work Roderick, thanks for sharing!

Todd

This is only a minimal change (with syntax error!) to the example on the howto page that you pointed me at. The kudos should go to the HTCondor project for such great support.


Also likely of interest is
   https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToLimitCpuUsage

Thanks, thats my next project!

ok, I see I get essentially the behaviour of ASSIGN_CPU_AFFINITY = True but with jobs allowed to use more cores when they are not being used, for free, when cgroups are configured. Thats cool!

Thanks again for your responses.

Roderick