[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] enforcing job memory limits?

Todd Tannenbaum wrote:
Jonathan D. Proulx wrote:

it appears as though a condor job in my flock ehausted the memory on a
user workstation over night ($CondorVersion: 6.8.2 Oct 12 2006 $
$CondorPlatform: X86_64-LINUX_RHEL3 $).  This triggered Linux's OOM
killer which killed several desktop apps and sshd on the system.

this seems a prety serious violation of the do no harm principle and
I'm a bit surprized.

is the a config setting I need to tweak on the workstations?

You could add a clause to your preempt expression, i.e. something like

   PREEMPT = ( whatever was there before ) || (ImageSize > (Memory-20))

This should work at least in the v6.9 series (and maybe in v6.8 as well? cannot recall offhand), since the condor_startd's value for "ImageSize" will be updated several times a minute to the total memory usage of the job.

Ouch, in my example above, I didn't deal with the fact that Memory is in megs and ImageSize is in Kbytes. So something like the following is what I meant:

PREEMPT = (whatever was there before) || \
          (ImageSize > ((Memory*0.8)*1024) )


Todd Tannenbaum                       University of Wisconsin-Madison
Condor Project Research               Department of Computer Sciences
tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                 Madison, WI 53706-1685