Re: [Condor-users] STARTD-based memory limit

Hi Steve,

Take a look at the WANT_HOLD documentation in condor:

It has a good example of a startd policy for holding jobs.  You can easily modify this to just preempt, or whatever policy you would like.

I believe the startd evaluates these attributes much more frequently than classad updates to schedd.


On Jun 2, 2011, at 9:15 AM, Steven Timm wrote:

> In my cluster I have been using a schedd-based method of
> killing jobs that are using too much memory.
> [root@fcdf1x1 local]# condor_config_val SYSTEM_PERIODIC_REMOVE
> (NumJobStarts > 10) || (ImageSize>=2500000) || (JobRunCount>=1 && JobStatus==1 && ImageSize>=1000000)
> But this has two weaknesses
> One is that sometimes it can take
> the shadow a long time to send the high memory value back to
> the schedd so the schedd can act, and in the meantime the job grows
> too fast and sucks up all ram on the node and starts killing other
> processes.
> The second one is that I have a diverse pool of nodes and
> would like jobs running on the nodes with bigger memory to use it if
> it is there.
> So is there a way to evict jobs that use, (ImageSize*2>Memory)?
> would you use the KILL or the PREEMPT function?
> Steve Timm
