[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Limiting memory for jobs

It's better to do a SYSTEM_PERIODIC_REMOVE expression which
runs at the schedd and just kills jobs with ImageSize>2000000
for instance.


On Wed, 22 Apr 2009, David Anderson wrote:

So I have a problem with people submitting jobs that slowly aquire memory
and in far greater amounts that my nodes can support for slots.  These cause
all kinds of problems ranging from unresponsive swapping systems to linux
OOM going nuts and killing whaterver processes can be found including condor
daemons, ssh etc.

I put into place a USER_JOB_WRAPPER that will limit the virtual address
space for the processes just this afternoon, but when I was doing that I am
thinking about how to do this better.  For instance even though right now
all my compute nodes have 8CPU/8GB ram that wont always be the case.   So my
question is what is everyone else doing?   Is it possible for
USER_JOB_WRAPPERS to get information about the slot, such as what the
maximum memory should be etc to make a script that will work for slots of
varying size.  Is there a better way to do this?


Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.