Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] cgroup error

Date: Tue, 25 Aug 2015 10:20:03 -0400
From: Michael V Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx>
Subject: Re: [HTCondor-users] cgroup error

Cristoph Beyer: "I thought the jobs that by far exceed the memory limit would be killed and go on hold but that seems only to happen from time to time (?)"

Hi Christoph, I've been using cgroups for about the last two and a half years, and my name is on a few of the patches for them, so I can tell you that from personal experience, the cgroups configuration of HTCondor doesn't inherently constrain the memory and processor utilization of the jobs, but rather provides a 100% accurate way for HTCondor to track that utilization by all processes involved in the job (except for condor_ssh_to_job processes).

By default, oversubscription of CPU shares is permitted - the cgroup just insures that when there's contention for available CPUs, each job will get at least the number of processors it specified in request_cpus. That is to say, if you run a "make -j 8" in a "request_cpus =1" slot, the make job will be able to use 8 CPUs as long as nobody else on the machine wants them, but if the machine is full it will at least get the one CPU it requested. This is true unless you enable CPU affinity via ASSIGN_CPU_AFFINITY or ENFORCE_CPU_AFFINITY, which prevents oversubscription.

The same goes for memory - unless you set up the enforcement, a job will be able to use as much memory as it wants until it wedges the machine into a swap-thrashing state (Red Hat 5) or runs afoul of the Out-of-Memory Killer (Red Hat 6), both of which I've encountered. The new Docker universe in 8.3/8.4 does, however, enforce a hard memory limit by default.

The details of limiting resource usage via cgroups is in the 8.2.9 manual section 3.12.14, referencing CGROUP_MEMORY_LIMIT_POLICY and the like. The manual should probably mention CPU affinity in that section too, in the second paragraph on page 446 regarding CPU usage.


	Michael V. Pelletier IT Program Execution Principal Engineer 978.858.9681 (5-9681) NOTE NEW NUMBER 339.293.9149 cell 339.645.8614 fax michael.v.pelletier@xxxxxxxxxxxx

References:
- [HTCondor-users] cgroup error
  - From: Beyer, Christoph

Prev by Date: Re: [HTCondor-users] cgroup error
Next by Date: Re: [HTCondor-users] cgroup error
Previous by thread: Re: [HTCondor-users] cgroup error
Next by thread: Re: [HTCondor-users] cgroup error
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] cgroup error