Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Delayed hold

Date: Mon, 30 Dec 2013 12:46:27 -0500
From: Ben Cotton <ben.cotton@xxxxxxxxxxxxxxxxxx>
Subject: Re: [HTCondor-users] Delayed hold

Rita,

If you want quicker holds, you may want to set WANT_HOLD on your
execute nodes. From the section 3.3.10 of the manual:

###
WANT_HOLD
A boolean expression that defaults to False. When True and the value
of PREEMPT becomes True and WANT_SUSPEND is False and
MAXJOBRETIREMENTTIME has expired, the job is put on hold for the
reason (optionally) specified by the variables WANT_HOLD_REASON and
WANT_HOLD_SUBCODE. As usual, the job owner may specify
periodic_release and/or periodic_remove expressions to react to
specific hold states automatically. The attribute HoldReasonCode in
the job ClassAd is set to the value 21 when WANT_HOLD is responsible
for putting the job on hold.

Here is an example policy that puts jobs on hold that use too much
virtual memory:

VIRTUAL_MEMORY_AVAILABLE_MB = (VirtualMemory*0.9)
MEMORY_EXCEEDED = ImageSize/1024 > $(VIRTUAL_MEMORY_AVAILABLE_MB)
PREEMPT = ($(PREEMPT)) || ($(MEMORY_EXCEEDED))
WANT_SUSPEND = ($(WANT_SUSPEND)) && ($(MEMORY_EXCEEDED)) =!= TRUE
WANT_HOLD = ($(MEMORY_EXCEEDED))
WANT_HOLD_REASON = \
   ifThenElse( $(MEMORY_EXCEEDED), \
               "Your job used too much virtual memory.", \
               undefined )
###

This will help avoid the "job crossed the memory threshold right after
the shadow received an update" issue.


Thanks,
BC

-- 
Ben Cotton
main: 888.292.5320

Cycle Computing
Leader in Utility HPC Software

http://www.cyclecomputing.com
twitter: @cyclecomputing

References:
- [HTCondor-users] Delayed hold
  - From: Rita
- Re: [HTCondor-users] Delayed hold
  - From: Ben Cotton
- Re: [HTCondor-users] Delayed hold
  - From: Rita

Prev by Date: Re: [HTCondor-users] HTCondor-users Digest, Vol 1, Issue 4
Next by Date: Re: [HTCondor-users] getting/passing the SlotID
Previous by thread: Re: [HTCondor-users] Delayed hold
Next by thread: [HTCondor-users] request_memory
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Delayed hold