[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Delayed hold



Rita,

If you want quicker holds, you may want to set WANT_HOLD on your
execute nodes. From the section 3.3.10 of the manual:

###
WANT_HOLD
A boolean expression that defaults to False. When True and the value
of PREEMPT becomes True and WANT_SUSPEND is False and
MAXJOBRETIREMENTTIME has expired, the job is put on hold for the
reason (optionally) specified by the variables WANT_HOLD_REASON and
WANT_HOLD_SUBCODE. As usual, the job owner may specify
periodic_release and/or periodic_remove expressions to react to
specific hold states automatically. The attribute HoldReasonCode in
the job ClassAd is set to the value 21 when WANT_HOLD is responsible
for putting the job on hold.

Here is an example policy that puts jobs on hold that use too much
virtual memory:

VIRTUAL_MEMORY_AVAILABLE_MB = (VirtualMemory*0.9)
MEMORY_EXCEEDED = ImageSize/1024 > $(VIRTUAL_MEMORY_AVAILABLE_MB)
PREEMPT = ($(PREEMPT)) || ($(MEMORY_EXCEEDED))
WANT_SUSPEND = ($(WANT_SUSPEND)) && ($(MEMORY_EXCEEDED)) =!= TRUE
WANT_HOLD = ($(MEMORY_EXCEEDED))
WANT_HOLD_REASON = \
   ifThenElse( $(MEMORY_EXCEEDED), \
               "Your job used too much virtual memory.", \
               undefined )
###

This will help avoid the "job crossed the memory threshold right after
the shadow received an update" issue.


Thanks,
BC

-- 
Ben Cotton
main: 888.292.5320

Cycle Computing
Leader in Utility HPC Software

http://www.cyclecomputing.com
twitter: @cyclecomputing