[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] our ideal configuration



On 6/29/07, Horvátth Szabolcs <szabolcs@xxxxxxxxxxxxx> wrote:
Matt Hope wrote:
> This is pretty much like ours except if you want the user defined tags
> you have to use startd RANK which triggers preemption. You can avoid
> this if you are willing to use the retirement time to stop the
> resulting preemptions actually causing vacations.
>
Yep, thats what I use but somehow its not as effective as it used to be.
Back in the 6.7.x days
when I set the rank of some jobs to a high value no low value job ever
started until the high ranked
jobs were serviced. I can't get the same behaviour with 6.8 and 6.9.
Might be an unintentional config change
on my part, though...

What is your claim timeout? do your users' make sure that the higher
tier jobs always have a higher priority? this is easy to do now that
the priority is an int not plus/minus 20. just add a few million for
each tier.

> Note that mixing user priority and 'tagged' jobs requires some sense
> on the part of the submitting user.
>
Thats ok for us, the pool has very few submitters. And abusers are
killed on the spot. ;)

A wonderful arrangement. I considered getting Mike a gun as a pressie.

> If you submit jobs which are tagged high. let them start running then
> submit a bunch of jobs on low but with much higher user priority then
> the schedd will put these up for negotiation first. This can lead to
> the high ones stopping (not sure if this will still happen - I gave up
> looking into that a while back and just decided to try to keep
> everything stable on our farm so such things didn't happen)
>
I thought that machine rank overrides user priorities altogether. Is it
not true?

kind of - the scheduling algorithm works by requesting jobs bit by bit
from the schedd/user virtual queue. IT was possible under certain
circumstances (say one user takes the whole pool but IIRC this was not
a requirement) that the scheduler says 'that's it no need to go
further down the queue as you would only be comparing against your
jobs that are already running which wouldn't make sense since they
were further down the queue'
This means tiers would have real issues with multiple different tier
structures between different groups of machines.
This may no longer be the case (I haven't been through the pie sharing
logic in a long time so the above might be well out of date)

> If you search the list archives for 'TIER' and RANK you will find
> several very useful threads that show you how to achieve what we
> setup, later ones will include use of retirement to totally prevent
> preemption due to the RANK)
>
I have such a setup running (mostly based on those posts) but it stopped
working efficiently.
It still takes machine rank into account but has much less effect.

Hopefully the two points above will help with that...ours is now very
stable in that regard but we ruthlessly enforce the rules (both groups
that use the farm automate the submission through helper applications
- I strongly recommend this approach if you will have several/one 'in
the know' users and others who don't want to learn the intricies and
just want the process to work as roughly a black box.

> We don't do DAGS - solves that one :)
>
Ok, I'll pass that on to my colleagues. ;) We only use DAGs...

there were several changes to the DAG defaults in  recent updates - I
remember noe of them :)

> That said there were some notes about using the recently added dag id
> and the expansion of the range of the user specified priority to order
> them appropriately, it may not play nice with Tiers with out some
> thought on syncing their ordering
>
Thats the only solution that I came up with but I don't like the fact
that job priority is replaced by dagmanjobid
using some stellar numbers. I can divide it of course but it still feels
a bit hacky approach.

indeed

Matt