[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] our ideal configuration

Hi Matt,
What is your claim timeout?
Well, I have claim_worklife set to 15 minutes and no request_claim_timeout set.

do your users' make sure that the higher
tier jobs always have a higher priority? this is easy to do now that
the priority is an int not plus/minus 20. just add a few million for
each tier.
****Tiers and priority is used completely separated (as I wrote I was under the impression that machine rank overrides job and user priority settings). I'd like to avoid messing with the job priority because it would make manual priority setting (for changing the job execution order for a user) a lot more difficult.
Thats ok for us, the pool has very few submitters. And abusers are
killed on the spot. ;)
A wonderful arrangement. I considered getting Mike a gun as a pressie.
We have cowboy rules here on the render-farm... ;))

I thought that machine rank overrides user priorities altogether. Is it
not true?
kind of - the scheduling algorithm works by requesting jobs bit by bit
from the schedd/user virtual queue. IT was possible under certain
circumstances (say one user takes the whole pool but IIRC this was not
a requirement) that the scheduler says 'that's it no need to go
further down the queue as you would only be comparing against your
jobs that are already running which wouldn't make sense since they
were further down the queue'
This means tiers would have real issues with multiple different tier
structures between different groups of machines.
This may no longer be the case (I haven't been through the pie sharing
logic in a long time so the above might be well out of date)
Well the situation you write about is way more complex than the problems I see. Lets say a user submits two dagman job. Both dag submits 100 jobs and execution starts in the order of job submission. Now I'd like to raise the tier / rank / importance of the secondly submitted dag. So I change the attributes for all jobs of that dag and wait. I expect that from this moment (or after the next negotiation cycle) only the jobs of the second dag should start (since they are preferred by the machine rank) but this is not what I get. A few tasks do start from the second dag but quite a few jobs still start running from the first one.

Hopefully the two points above will help with that...ours is now very
stable in that regard but we ruthlessly enforce the rules (both groups
that use the farm automate the submission through helper applications
- I strongly recommend this approach if you will have several/one 'in
the know' users and others who don't want to learn the intricies and
just want the process to work as roughly a black box.
All submission is done through scripts so submission is pretty much controlled.