[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Getting preemption going..



Hi all,

we are somehow stuck with trying to get preemption going while
guaranteeing some minimal run times.

For this, we define on the startd this

MinRunTimeHours = 1
STARTD_ATTRS =  MinRunTimeHours

(we will have several classes of machines where we set this to 1, 5, 10
or 20 hours).

On the negotiator, we set

JobExceedsMinRunTime = $(ActivationTimer) > ( MinRunTimeHours * 60)
NewUserBetterPrio = RemoteUserPrio > SubmitterUserPrio * 1.2
PREEMPTION_REQUIREMENTS = ($(JobExceedsMinRunTime)) &&
($(NewUserBetterPrio))

for debugging it look a bit longer, but does not really add much else to
it[1].

During a negotiation cycle, PREEMPTION_REQUIREMENTS does evaluate to
true and as we do not set rank to be anything else as 0, we would expect
the idle job to preempt the running job.

We currently have pslot preemption enabled as all nodes feature a single
large partition-able slot:

ALLOW_PSLOT_PREEMPTION = True
MAXJOBRETIREMENTTIME = 600
NEGOTIATOR_DEBUG = D_FULLDEBUG
NEGOTIATOR_CONSIDER_EARLY_PREEMPTION = True (same happens with False here)


For testing we submit two job clusters which fully fill a target node
and to make things easier, both clusters compete for the very same
machine, in our case "a3305" via Requirements = (Machine ==
"a3305.atlas.local") in the jib submit file.

Debug output for a preemption match looks like

03/16/20 16:44:53 Classad debug: 1584347460 --> 1584347460
03/16/20 16:44:53 Classad debug: [0.00906ms] JobStart --> 1584347460
03/16/20 16:44:53 Classad debug: time() --> 1584377093
03/16/20 16:44:53 Classad debug: 1584347460 --> 1584347460
03/16/20 16:44:53 Classad debug: [0.00596ms] JobStart --> 1584347460
03/16/20 16:44:53 Classad debug: [0.03695ms] ifThenElse(JobStart isnt
undefined,(time() - JobStart),0) --> 29633
03/16/20 16:44:53 Classad debug: 1 --> 1
03/16/20 16:44:53 Classad debug: [0.00691ms] MinRunTimeHours --> 1
03/16/20 16:44:53 Classad debug: [0.00095ms] RemoteUserPrio --> 353331
03/16/20 16:44:53 Classad debug: [0.00095ms] SubmitterUserPrio --> 230.536
03/16/20 16:44:53 Classad debug: "a3305.atlas.local" --> a3305.atlas.local
03/16/20 16:44:53 Classad debug: [0.00691ms] Machine --> a3305.atlas.local
03/16/20 16:44:53 Classad debug: [0.00095ms] MY --> CLASSAD
03/16/20 16:44:53 Classad debug: [0.00095ms] "user.a@xxxxxxxxxxx" -->
user.a@xxxxxxxxxxx
03/16/20 16:44:53 Classad debug: [0.01502ms] MY.AccountingGroup -->
user.a@xxxxxxxxxxx
03/16/20 16:44:53 Classad debug: .RIGHT --> CLASSAD
03/16/20 16:44:53 Classad debug: [0.00691ms] TARGET --> CLASSAD
03/16/20 16:44:53 Classad debug: "user.b" --> user.b
03/16/20 16:44:53 Classad debug: [0.02098ms] TARGET.AccountingGroup -->
user.b
03/16/20 16:44:53 Classad debug: MY --> CLASSAD
03/16/20 16:44:53 Classad debug: 0.0 --> 0
03/16/20 16:44:53 Classad debug: [0.01311ms] MY.rank --> 0
03/16/20 16:44:53 Classad debug: [0.15998ms] ifThenElse(JobStart isnt
undefined,(time() - JobStart),0) > (MinRunTimeHours * 60) &&
RemoteUserPrio > SubmitterU
serPrio * 1.200000000000000E+00 && Machine isnt undefined &&
MY.AccountingGroup isnt undefined && TARGET.AccountingGroup isnt
undefined && (MY.rank isnt undef
ined || TARGET.rank isnt undefined) --> TRUE

But after doing this for every running job it ends with (lines from a
later cycle):

03/16/20 16:55:35     Send END_NEGOTIATE to remote schedd
03/16/20 16:55:35   Submitter user.b@xxxxxxxxxxx got all it wants;
removing it.
03/16/20 16:55:35  resources used by user.b@xxxxxxxxxxx are 0.000000

Anyone an idea what we are doing wrong here?

cheers and thanks a lot in advance for any hint!

Carsten

[1] PREEMPTION_REQUIREMENTS = debug( $(JobExceedsMinRunTime) &&
$(NewUserBetterPrio) && Machine =!= UNDEFINED && MY.AccountingGroup =!=
UNDEFINED && TARGET.AccountingGroup =!= UNDEFINED && (MY.rank =!=
UNDEFINED || TARGET.rank =!= UNDEFINED))
-- 
Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics,
CallinstraÃe 38, 30167 Hannover, Germany
Phone: +49 511 762 17185

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature