[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Limiting memory for jobs

So I have a problem with people submitting jobs that slowly aquire memory and in far greater amounts that my nodes can support for slots.  These cause all kinds of problems ranging from unresponsive swapping systems to linux OOM going nuts and killing whaterver processes can be found including condor daemons, ssh etc.

I put into place a USER_JOB_WRAPPER that will limit the virtual address space for the processes just this afternoon, but when I was doing that I am thinking about how to do this better.  For instance even though right now all my compute nodes have 8CPU/8GB ram that wont always be the case.   So my question is what is everyone else doing?   Is it possible for USER_JOB_WRAPPERS to get information about the slot, such as what the maximum memory should be etc to make a script that will work for slots of varying size.  Is there a better way to do this?


David Anderson