[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Getting preemption going..



Hi Carsten,

nice approach, ther is a '$' missing but that is probably just an error that came in during copying to e-mail ;) 

JobExceedsMinRunTime = $(ActivationTimer) > ( MinRunTimeHours * 60)

Best
Christoph

-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

----- UrsprÃngliche Mail -----
Von: "Carsten Aulbert" <carsten.aulbert@xxxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Montag, 16. MÃrz 2020 17:57:53
Betreff: [HTCondor-users] Getting preemption going..

Hi all,

we are somehow stuck with trying to get preemption going while
guaranteeing some minimal run times.

For this, we define on the startd this

MinRunTimeHours = 1
STARTD_ATTRS =  MinRunTimeHours

(we will have several classes of machines where we set this to 1, 5, 10
or 20 hours).

On the negotiator, we set

JobExceedsMinRunTime = $(ActivationTimer) > ( MinRunTimeHours * 60)
NewUserBetterPrio = RemoteUserPrio > SubmitterUserPrio * 1.2
PREEMPTION_REQUIREMENTS = ($(JobExceedsMinRunTime)) &&
($(NewUserBetterPrio))

for debugging it look a bit longer, but does not really add much else to
it[1].

During a negotiation cycle, PREEMPTION_REQUIREMENTS does evaluate to
true and as we do not set rank to be anything else as 0, we would expect
the idle job to preempt the running job.

We currently have pslot preemption enabled as all nodes feature a single
large partition-able slot:

ALLOW_PSLOT_PREEMPTION = True
MAXJOBRETIREMENTTIME = 600
NEGOTIATOR_DEBUG = D_FULLDEBUG
NEGOTIATOR_CONSIDER_EARLY_PREEMPTION = True (same happens with False here)


For testing we submit two job clusters which fully fill a target node
and to make things easier, both clusters compete for the very same
machine, in our case "a3305" via Requirements = (Machine ==
"a3305.atlas.local") in the jib submit file.

Debug output for a preemption match looks like

03/16/20 16:44:53 Classad debug: 1584347460 --> 1584347460
03/16/20 16:44:53 Classad debug: [0.00906ms] JobStart --> 1584347460
03/16/20 16:44:53 Classad debug: time() --> 1584377093
03/16/20 16:44:53 Classad debug: 1584347460 --> 1584347460
03/16/20 16:44:53 Classad debug: [0.00596ms] JobStart --> 1584347460
03/16/20 16:44:53 Classad debug: [0.03695ms] ifThenElse(JobStart isnt
undefined,(time() - JobStart),0) --> 29633
03/16/20 16:44:53 Classad debug: 1 --> 1
03/16/20 16:44:53 Classad debug: [0.00691ms] MinRunTimeHours --> 1
03/16/20 16:44:53 Classad debug: [0.00095ms] RemoteUserPrio --> 353331
03/16/20 16:44:53 Classad debug: [0.00095ms] SubmitterUserPrio --> 230.536
03/16/20 16:44:53 Classad debug: "a3305.atlas.local" --> a3305.atlas.local
03/16/20 16:44:53 Classad debug: [0.00691ms] Machine --> a3305.atlas.local
03/16/20 16:44:53 Classad debug: [0.00095ms] MY --> CLASSAD
03/16/20 16:44:53 Classad debug: [0.00095ms] "user.a@xxxxxxxxxxx" -->
user.a@xxxxxxxxxxx
03/16/20 16:44:53 Classad debug: [0.01502ms] MY.AccountingGroup -->
user.a@xxxxxxxxxxx
03/16/20 16:44:53 Classad debug: .RIGHT --> CLASSAD
03/16/20 16:44:53 Classad debug: [0.00691ms] TARGET --> CLASSAD
03/16/20 16:44:53 Classad debug: "user.b" --> user.b
03/16/20 16:44:53 Classad debug: [0.02098ms] TARGET.AccountingGroup -->
user.b
03/16/20 16:44:53 Classad debug: MY --> CLASSAD
03/16/20 16:44:53 Classad debug: 0.0 --> 0
03/16/20 16:44:53 Classad debug: [0.01311ms] MY.rank --> 0
03/16/20 16:44:53 Classad debug: [0.15998ms] ifThenElse(JobStart isnt
undefined,(time() - JobStart),0) > (MinRunTimeHours * 60) &&
RemoteUserPrio > SubmitterU
serPrio * 1.200000000000000E+00 && Machine isnt undefined &&
MY.AccountingGroup isnt undefined && TARGET.AccountingGroup isnt
undefined && (MY.rank isnt undef
ined || TARGET.rank isnt undefined) --> TRUE

But after doing this for every running job it ends with (lines from a
later cycle):

03/16/20 16:55:35     Send END_NEGOTIATE to remote schedd
03/16/20 16:55:35   Submitter user.b@xxxxxxxxxxx got all it wants;
removing it.
03/16/20 16:55:35  resources used by user.b@xxxxxxxxxxx are 0.000000

Anyone an idea what we are doing wrong here?

cheers and thanks a lot in advance for any hint!

Carsten

[1] PREEMPTION_REQUIREMENTS = debug( $(JobExceedsMinRunTime) &&
$(NewUserBetterPrio) && Machine =!= UNDEFINED && MY.AccountingGroup =!=
UNDEFINED && TARGET.AccountingGroup =!= UNDEFINED && (MY.rank =!=
UNDEFINED || TARGET.rank =!= UNDEFINED))
-- 
Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics,
CallinstraÃe 38, 30167 Hannover, Germany
Phone: +49 511 762 17185


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/