[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Problem with memory enforcement -> crashing Startd



Hi

I am trying to enforce a memory limit on our cluster.
What I want is for condor to compare the actually used RAM with the requested Memory.
If the job uses more RAM than requested (Memory) the job is to be put on hold.

Starting from the example from the manual I came up with:

#Only Resident Memory (RAM) taken into account
MEMORY_USED_BY_JOB_MB = ResidentSetSize/1024
MEMORY_EXCEEDED = ($(MEMORY_USED_BY_JOB_MB)) > Memory
PREEMPT = ($(PREEMPT)) || ($(MEMORY_EXCEEDED))
WANT_SUSPEND = ($(WANT_SUSPEND)) && ($(MEMORY_EXCEEDED)) =!= TRUE
WANT_HOLD = ( $(MEMORY_EXCEEDED) )

WANT_HOLD_REASON = \
       ifThenElse( $(MEMORY_EXCEEDED), \
               "Your job exceeded the amount of requested memory on this machine.", \
               undefined )

However, this leads to a crashing Startd.

>From the StartLog: It starts with
03/07/13 11:16:42 slot1_1: Can't evaluate PREEMPT in the context of following ads
followed by the ClassAds of the running job(emitted here)
followed by:
03/07/13 11:16:42 ERROR "Can't evaluate PREEMPT" at line 1615 in file /slots/01/dir_12130/userdir/src/condor_startd.V6/Resource.cpp
03/07/13 11:16:42 slot1_1: Changing state and activity: Claimed/Busy -> Preempting/Killing
03/07/13 11:16:42 startd exiting because of fatal exception.
03/07/13 11:17:23 Setting maximum accepts per cycle 8.
03/07/13 11:17:23 ******************************************************
03/07/13 11:17:23 ** condor_startd (CONDOR_STARTD) STARTING UP
03/07/13 11:17:23 ** /usr/sbin/condor_startd
03/07/13 11:17:23 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1)
03/07/13 11:17:23 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
03/07/13 11:17:23 ** $CondorVersion: 7.8.7 Dec 12 2012 BuildID: 86173 $
03/07/13 11:17:23 ** $CondorPlatform: x86_64_deb_6.0 $
03/07/13 11:17:23 ** PID = 1224
03/07/13 11:17:23 ** Log last touched 3/7 11:16:42
03/07/13 11:17:23 ******************************************************

I would greatly appreciate any help in this regard.

Greetings from Austria,
Hermann
-- 
-------------
DI Hermann Fuchs
Christian Doppler Laboratory for Medical Radiation Research for Radiation Oncology
Department of Radiation Oncology
Medical University Vienna
Währinger Gürtel 18-20
A-1090 Wien

Tel.  + 43 / 1 / 40 400 7271
Mail. hermann.fuchs@xxxxxxxxxxxxxxxx