[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] STARTD-based memory limit



Hi Steve,

Take a look at the WANT_HOLD documentation in condor:
http://www.cs.wisc.edu/condor/manual/v7.5/3_3Configuration.html#SECTION004310000000000000000

It has a good example of a startd policy for holding jobs.  You can easily modify this to just preempt, or whatever policy you would like.

I believe the startd evaluates these attributes much more frequently than classad updates to schedd.

-Derek




On Jun 2, 2011, at 9:15 AM, Steven Timm wrote:

> 
> In my cluster I have been using a schedd-based method of
> killing jobs that are using too much memory.
> 
> [root@fcdf1x1 local]# condor_config_val SYSTEM_PERIODIC_REMOVE
> (NumJobStarts > 10) || (ImageSize>=2500000) || (JobRunCount>=1 && JobStatus==1 && ImageSize>=1000000)
> 
> But this has two weaknesses
> 
> One is that sometimes it can take
> the shadow a long time to send the high memory value back to
> the schedd so the schedd can act, and in the meantime the job grows
> too fast and sucks up all ram on the node and starts killing other
> processes.
> 
> The second one is that I have a diverse pool of nodes and
> would like jobs running on the nodes with bigger memory to use it if
> it is there.
> 
> So is there a way to evict jobs that use, (ImageSize*2>Memory)?
> would you use the KILL or the PREEMPT function?
> 
> Steve Timm
> 
> 
> 
> -- 
> ------------------------------------------------------------------
> Steven C. Timm, Ph.D  (630) 840-8525
> timm@xxxxxxxx  http://home.fnal.gov/~timm/
> Fermilab Computing Division, Scientific Computing Facilities,
> Grid Facilities Department, FermiGrid Services Group, Group Leader.
> Lead of FermiCloud project.
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/

Attachment: smime.p7s
Description: S/MIME cryptographic signature