[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] our ideal configuration

On 7/2/07, Horvátth Szabolcs <szabolcs@xxxxxxxxxxxxx> wrote:
Hi Matt,
> What is your claim timeout?
Well, I have claim_worklife set to 15 minutes and no
request_claim_timeout set.

You spotted my typo - I meant claim_worklife not request_claim_timeout :)
see below:

> do your users' make sure that the higher
> tier jobs always have a higher priority? this is easy to do now that
> the priority is an int not plus/minus 20. just add a few million for
> each tier.
****Tiers and priority is used completely separated (as I wrote I was
under the impression that machine rank overrides
job and user priority settings). I'd like to avoid messing with the job
priority because it would make
manual priority setting (for changing the job execution order for a
user) a lot more difficult.

This may be a problem - you would need an canonical answer on that
from someone who looked at the negotiation more recently than I.

>> I thought that machine rank overrides user priorities altogether. Is it
>> not true?
> kind of - the scheduling algorithm works by requesting jobs bit by bit
> from the schedd/user virtual queue. IT was possible under certain
> circumstances (say one user takes the whole pool but IIRC this was not
> a requirement) that the scheduler says 'that's it no need to go
> further down the queue as you would only be comparing against your
> jobs that are already running which wouldn't make sense since they
> were further down the queue'
> This means tiers would have real issues with multiple different tier
> structures between different groups of machines.
> This may no longer be the case (I haven't been through the pie sharing
> logic in a long time so the above might be well out of date)
Well the situation you write about is way more complex than the problems
I see.
Lets say a user submits two dagman job. Both dag submits 100 jobs and
execution starts
in the order of job submission. Now I'd like to raise the tier / rank /
importance of the secondly submitted dag.
So I change the attributes for all jobs of that dag and wait. I expect
that from this moment (or after the next
negotiation cycle) only the jobs of the second dag should start (since
they are preferred by the machine rank)
but this is not what I get. A few tasks do start from the second dag but
quite a few jobs still start running from the first one.

Well claim_worklife will account for some of them if the some of the
jobs last less than 15 minutes. See what happens if you drop it to
nearly zero or try some dags with fake jobs which just sleep for 16