[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Troubleshooting job eviction by machine RANK



We've been running htcondor here for many years but with a fairly static configuration, in which jobs have only run on the machines owned by the research group.

We did this simply by defining a variable "CondorGroup" which is both defined per group, and used in the START expression, something like:

CondorGroup = "novafarm"
SUBMIT_EXPRS = CondorGroup, $(SUBMIT_EXPRS)
START = (CondorGroup =?= "novafarm") || (CondorGroup =?= "system")

We're now trying (belatedly!) to enable opportunistic scheduling so that unused resources can be used more efficiently. I am trying to do this by setting machine ranks, and then letting jobs run on available other-group systems by defining a variable "CanEvict" - so:

CondorGroup = "novafarm"
SUBMIT_EXPRS = CondorGroup, $(SUBMIT_EXPRS)
START = (CondorGroup =?= "novafarm") || (CondorGroup =?= "system") || CanEvict
RANK = (20 * CondorGroup =?= "novafarm") + (10 * CondorGroup =?= "bes3farm")
> MachineMaxVacateTime = 300

So then someone can submit a job with (eg) +CondorGroup = "general" and +CanEvict = True in order to run on any vacant slot.

What we find is that these jobs do start on the other-group machines, but when another group then submits their own jobs, the guest jobs never get evicted.

I don't have anything else defined regarding preemption:

WANT_SUSPEND           = TRUE
WANT_VACATE            = TRUE
SUSPEND                = FALSE
CONTINUE               = TRUE
PREEMPT                = FALSE
KILL                   = FALSE
PREEMPTION_REQUIREMENTS        = FALSE
PREEMPTION_RANK                = 0

but my interpretation was that RANK by itself should achieve the desired effect.

Have I written enough for anyone to say where I'm going wrong?

Thanks, Graham
--