[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Dagging deeper in priorities and ranks



On 7/17/07, Horvátth Szabolcs <szabolcs@xxxxxxxxxxxxx> wrote:
Hi Matt,
> NEGOTIATOR_PRE_JOB_RANK is the preference of a job to run on a machine
> which overrides the jobs rank.
> NEGOTIATOR_PROST_JOB_RANK is the preference of a job to run on a
> machine which is subordinate to the job's definition.
>
> They have no effect on which job in a queue gets to run first...
>
Thanks, I guess I completely misunderstood these attributes. I thought
that they simply reorder the list of available resources
after requirements were matched.

Remember that there are two orderings:

jobs 'scoring' machines for 'weak' preference  (the negotiator can
apply a pre and post order to this as described above). there is no
guarantee made that this preference will be followed because...

machines score jobs (via their startd's RANK config) this is outside
the control of the negotiator (though it tries to respect it since
it's waste if it doesn't).

This is further complicated by the way in which the negotiation
algorithm works it's way through the queues (based on user priority)
and further through the jobs in each queue (based on job priority and
then time of submission).

A job's (and by extension NEGOTIATOR_{PRE|POST}_JOB_RANK) rank should
_probably_ never refer to attributes of the job which you are using
solely for the ordering of the jobs relative to their brethren in the
queue.

Instead you want to either:

1) Accept the massive hit in preemption and resulting use of
retirement time and let the *startd* RANK reference the id of the job
(with some other means of distinguishing between the relative
importance of different queues. Note that this last bit is a PITA and
is why you should avoid this route)

2) make your submitting machines organize themselves properly (the
intended behaviour) and set their priority to be the negation of the
dag job id. This is fine except when a job from a newer dag gets
running because it was top of the queue then a new job from an older
dag is created. This may trigger a preemption since the schedd will
present this new job before the running one, if there are no more
machines available the negotiator may preempt the running job to make
way. This is entirely rational from the point of  view of the
negotiator - it will place your jobs as the queue presents them.
To get round this if such preemption is viewed as to aggressive a way
to manage the order you either make use of the aforementioned
retirement or have some help task which boosts the priority of any job
which starts to run such that it cannot be preempted by any non
running job (removing this boost if it is pre empted for another
reason is additional hassle though)

2 is much better but you do have to decide how you will deal with
fresh jobs from older dags - dealing with the relevant priority
numbers by hand is a bit of a pain (it would be nice if there was a
way maintaining relative ordering of idle jobs in the queue without
affecting running ones - but it is reasonable to say that new jobs
from old dags preempt older jobs from newer dags.

If this all sounds like a lot of hassle/complex it is. I've used this
sort of thing as an interview question quite a few times. An ability
to view your queue as a reorderable pipe which won't reorder a running
job lower than a non running job would make this rather simpler
(though vastly more complex underneath) and I suspect match more
closely in the vanilla world with the 'expected' or desired default
behaviour. But it doesn't so it's probably time to start working round
it sorry.

> The matchlog is normally a reasonable indicator if you are willing to
> go back and look at the jobs which were being compared.
>
There are no rank values in the match log so I don't really see how
these expressions affect the matchmaking process.

You look at what 'won' then go and look at the jobs/machines directly
to work out why...

Matt