[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] badput






On Wed, Nov 14, 2012 at 1:27 AM, Ian Cottam <Ian.Cottam@xxxxxxxxxxxxxxxx> wrote:
A colleague just asked me:

"When a Condor node runs out of memory - to the point that it starts
evicting jobs - does it:

1. Evict the most recently started job to minimize "badput"?
2. Evict the first job that requests more memory when all the memory has
been exhausted?
3. Some other strategy?"

I'm not sure.

In my experience, it's the first job to exceed the memory available for its slot, or (if dynamic slots are in use) its RequestMemory setting.
 
Note that if the machine as a whole runs out of RAM and swap before Condor reacts, the OS's out of memory reactions come into play.  In Linux what happens at that point is configurable, but the default is for the kernel to start killing off processes until there's enough free RAM to proceed.  I'm not sure what heuristics it uses to decide what to kill off, but it usually seems to be the largest process first.

--
David Brodbeck
System Administrator, Linguistics
University of Washington