[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Help improving how machines are choosen for idle jobs



I'm trying to improve the way jobs run in my system. Right now I'm not
seeing a lot width when clusters execute even though I've set
NEGOTIATOR_IGNORE_USER_PRIORITIES = True. I think it may be due to how I
have machines sorting.

My understanding is that a schedd takes a list of all the available
machines, and for each job finds all the machines that match to the job
that aren't in the 'Owner' state and have START = True. Is this correct?
Then the list of machines is sorted. First by NEGOTIATOR_PRE_JOB_RANK.
If there is only one machine at top of list, this is our candidate. If
there's more than one that tie for the top spot they're sorted by the
machine RANK. If there's still more than one that tie for the top spot
they're sorted by NEGOTIATOR_POST_JOB_RANK. At this point if the top
machine is Unclaimed the sorting is stopped and this is the machine for
the job. If we have to preempt a running job the list is checked against
PREEMPTION_REQUIREMENTS and any machine that evaluates to true is then
sorted by PREEMPTION_RANK to find the machine for the job.

Have I got that right? I think part of the reason I can't get the
resource usage patterns I want is that I'm interpreting the sorting
wrong. Can someone correct the sort explanation above please?

Also, who does the sorting at each pass? The negotiator or the schedd?
If it's the schedd, does the negotiator tell it how to sort based on
these settings, or do I have to make sure every schedd in my system has
these four settings made?

If I have this right, then to improve how wide my clusters run, I need
to make sure machines that are unclaimed sort first. So would:

## Prefer unclaimed machines that have been that way for a long
## time.
NEGOTIATOR_PRE_JOB_RANK = ((Activity =?= "Unclaimed") * 1000000000) -
EnteredCurrentActivity

## Prefer unclaimed machines, otherwise let preemption ranking
## sort them.
NEGOTIATOR_POST_JOB_RANK = Activity =?= "Unclaimed"

## Only preempt if the job has just started, and it's ranking on
## this machine is much worse than the new job's ranking on this
machine,
## and only if comes from either another user or another claiming
## machine.
PREEMPTION_REQUIREMENTS = (CurrentTime - My.JobStart) < 240 &&
(TARGET.Rank - MY.Rank) > 2880 && ((TARGET.RemoteOwner =!=
MY.RemoteOwner) || (TARGET.ClientMachine =!= TARGET.ClientMachine))

## Sort machines for preemption based on their machine rank and who
## is running the job. Strongly prefer to preempt other peoples jobs.
PREEMPTION_RANK = 1/(TARGET.Rank + ((TARGET.RemoteOwner =?=
MY.RemoteOwner) * 1000000000))


Thanks!

- Ian