Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] STARTD-based memory limit

Date: Thu, 2 Jun 2011 09:15:42 -0500 (CDT)
From: Steven Timm <timm@xxxxxxxx>
Subject: [Condor-users] STARTD-based memory limit


In my cluster I have been using a schedd-based method of
killing jobs that are using too much memory.

[root@fcdf1x1 local]# condor_config_val SYSTEM_PERIODIC_REMOVE

(NumJobStarts > 10) || (ImageSize>=2500000) || (JobRunCount>=1 &&JobStatus==1 && ImageSize>=1000000)


But this has two weaknesses

One is that sometimes it can take
the shadow a long time to send the high memory value back to
the schedd so the schedd can act, and in the meantime the job grows
too fast and sucks up all ram on the node and starts killing other
processes.

The second one is that I have a diverse pool of nodes and
would like jobs running on the nodes with bigger memory to use it if
it is there.

So is there a way to evict jobs that use, (ImageSize*2>Memory)?
would you use the KILL or the PREEMPT function?

Steve Timm



--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Group Leader.
Lead of FermiCloud project.

Follow-Ups:
- Re: [Condor-users] STARTD-based memory limit
  - From: Matthew Farrellee
- Re: [Condor-users] STARTD-based memory limit
  - From: Derek Weitzel

Prev by Date: [Condor-users] Formatting the output of condor_userprio
Next by Date: Re: [Condor-users] STARTD-based memory limit
Previous by thread: Re: [Condor-users] Formatting the output of condor_userprio
Next by thread: Re: [Condor-users] STARTD-based memory limit
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[Condor-users] STARTD-based memory limit