[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Preempt jobs which exceed their request_memory - but no parallel universe?



I'm confused.

I have a couple of users who underestimate the memory their jobs
would attempt to allocate, and as a result some worker nodes end
up swapping heavily.
I tried to get those jobs preempted, and sent back into the queue
with their updated (ImageSize) request_memory:

# Let job use its declared amount of memory and some more
MEMORY_EXTRA            = 2048
MEMORY_ALLOWED          = (Memory + $(MEMORY_EXTRA)*Cpus)
# Get the current footprint
MEMORY_CURRENT          = (ImageSize/1024)
# Exceeds expectations?
MEMORY_EXCEEDED         = $(MEMORY_CURRENT) > $(MEMORY_ALLOWED)
# If exceeding, preempt
#[preset]PREEMPT        = False
PREEMPT                 = ($(PREEMPT)) || ($(MEMORY_EXCEEDED))
WANT_SUSPEND            = False

As I don't want the negotiator to preempt jobs based on user priorities
(I still believe that only the STARTD/SHADOW side should take care),
I have the following settings in place as well, taken from config lines
collected over the years:

# condor_config_val -dump | grep PREEMPT
NEGOTIATOR_CONSIDER_EARLY_PREEMPTION = false
NEGOTIATOR_CONSIDER_PREEMPTION = False
PREEMPT_VANILLA = False
PREEMPTION_RANK = (RemoteUserPrio * 1000000) - ifThenElse(isUndefined(TotalJobRuntime), 0, TotalJobRuntime)
PREEMPTION_REQUIREMENTS = False
SCHEDD_PREEMPTION_RANK =
SCHEDD_PREEMPTION_REQUIREMENTS =

Um, that doesn't look too nice but PREEMPTION_RANK is supposed not
to be used at all, right?
I cannot see PREEMPT_VANILLA mentioned in the manual?
Something *must* stop the rules from becoming effective as I still
see 7 GB jobs running in a 2 GB slot :(

Also, I don't want to apply the preemption rule to Parallel Universe
jobs (some rank-0 processes tend to go way beyond what the other
ranks would spend, and requesting that huge amount for all ranks
is simply impossible). How?

Help is appreciated!
Thanks,
 Steffen