[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] How to have schedd drop claim after each job

This makes more sense - thanks.

the retirement time is always validated the total time the job has spent on this machine on this particular attempt yes?

> -----Original Message-----
> From: Dan Bradley

> Yes.  This is exactly how I wanted it to be too:)  Let's just 
> say this 
> behavior could still be implemented some time in the future, but the 
> graceful-claim-retirement feature provides many of the same benefits 
> today and even has some advantages.  Consider:
> 1) preemption_requirements only applies to preemption due to user 
> priority, not machine rank preemption, the PREEMPT expression, or 
> graceful restart/shutdown.  There are good reasons for that, but we 
> wanted a simple policy knob that would control all cases where Condor 
> kills jobs, so an admin can confidently say, "Machine X will 
> never kill 
> jobs of type Y within bound Z."  Of course, the admin can always 
> override this later or the power can go down or whatever, but the 
> machine policy for job-killing under normal circumstances is 
> expressable 
> and jobs can form ClassAd requirements or rank expressions 
> based upon it 
> if they care to.

so to clarify possible reasons a job dies

state + pre condition
-> behaviour
: post condition

admin/user does condor_rm 
-> no time out (I hope)
: is machine still claimed?

admin/user does condor_vacate 
-> no time out (I hope)              
: is machine still claimed?

PREEMPT evaluates to true 
-> time out?
: I read this as machine unclaimed

user prio is higher and PREEMPTION_REQUIREMENTS evaluates to true
-> retirement timeout
: I read this as machine unclaimed

Graceful shutdown requested 
-> retirement timeout is used instead of the normal gracefultime out?
: I read this as machine unclaimed

Machine ranks a job higher than an existing one - relative user prio immaterial 
-> timeout 
: Unclaimed

It is this last one that is critical to me - if it only works if the user prio is higher then it's not much use to preform job rather than user allocation policies...
> 2) Once a resource claim is established, the throughput of 
> that claim is 
> independent of load or accessibility of the 
> negotiator/collector.  It is 
> difficult to achieve this if you delay matchmaking until the 
> moment in 
> time between one job and the next.  You mentioned several 
> ways we also 
> considered addressing this, all at the cost of considerable 
> complexity 
> when you get down to the messy details--adding state to the 
> matchmaker etc.

I agree the details on the matchmaker carry considerable side effect heavy possibilities - but with the benefit that the complexities you mention below within the retired state disappear. I wouldn't know which was worse without the code :¬)
> >the behaviour I infer from the mail below is
> >
> >1) Machine A Claimed by (at the time) the best job for it.
> >2) New job added to queue (or released / qedited etc. etc.)
> >3) This job evaluates to a higher rank on the machine A that 
> the current job
> >
> >4a) preemption_requirements evaluates true.
> >5a) the currently running job gets an additional amount of 
> time to complete before vacation 
> >  
> >
> Yes, except the machine policy does not exactly specify how much 
> _additional_ time to give to the job before vacation.  It 
> expresses the _maximum_ time that will ever be given to the job before 
> vacation.  If the job has already run (uninterrupted) for 6 days and the maximum 
> retirement time is 2 days, then it will be vacated 
> immediately. 

That makes more sense

> The job is also free to provide its own retirement time that is lower 

and that also makes it more powerful
> >This is an improvement but does not really provide the 
> desired control I list above - since I do not necessarilly 
> know in advance how long is reasonable to give to a job.
> >
> If you don't want jobs to be interrupted, you have to have 
> some idea how long is reasonable.  In a very controlled environment, a 
> month might be 
> reasonable.  Or maybe some types of jobs or users should be 
> granted more 
> than others.  Whatever makes sense.

Hehe - I'm on windows - I don't have the luxury of persistent state vacation - a month is right out - I have jobs that take minutes to hours across > 100 vm's but with considerable pressure to reduce latency for high priority jobs without just thrashing lower jobs all the time. No existing policy control mechanisms in condor allow me to do this (there is a regrettably high likelyhood of abberant jobs which immediately die due to user/code errors and I want these to run as fast as possible so the user knows but without repeatedly killing off 500 compute hours of work - human dev iterations are long enough to waste a lot of time without being so long as to allow significant throughput in the gap).

Also the lamentable fact that a user who changes their password and submits a high priority job withour condor_store_cred will have their job bounce merrily across the farm trashing all the others. I would dearly love that to be viewed as a terminal error and the users jobs shut down till they fix it - with an appropriate error message. If this functionality makes that issue more costly due to a retirement being more acceptable than straight preemption but still causing thrashing it will hurt.

What I want is

1) Preemption only in critical jobs (clearly marked) (or where a job is awol - realistically 1 day)
2) A job with a higher rank will ALWAYS start before a job with a lower rank
3) Where ranks are equal spread out resources to users fairly at that instant*
4) User specified priority of their jobs

* in effect priority_halflife = double.MinValue

the issue of the -20->+20 user specified priority (usp) is tricky since it could conflict with the machine ranking - I am happy to use that as a differentiator between 1 and 2 that means anyone daft enough to put a clearly (machine) lower job at higher prio gets whats coming to them and has to wait till their idea of a higher job goes through. In prctice this won't happen very often so we can take the hit.

If the residual claim on the machine is 'fixed' such that it still requires a users priority to be higher** than anothers for it to work this just isn't going to solve a large part of my problem...

** 'better' rather than higher which is worse I know

> >I suppose I can simulate the above behaviour by pushing this 
> retirement timout very high but will this lead to issues 
> further down the line such as:
> >
> >1) Another machine becomes free but the pending job cannot use it
> >
> When the schedd is waiting for a claim to be ready for use, it will 
> periodically go back to the negotiator to see if it can find 
> something 
> better.  There's room for making this mechanism more 
> sophisticated, but 
> it works today and you can tune it if necessary.

that's ok that sorts that prob in a reasonablly complete manner
> >2) Another job of even better rank cannot take the pending 
> claim off the existing one.
> >
> There's no reason why it can't.  (But am I about to go look 
> at the code 
> and double-check this is handled correctly?!)

my apologies I interpreted it that way - however will it use the MAX_CLAIM_TIMEOUT setting if the schedd dies or not for example?
> More features generally amounts to more complexity:(  


> However, it's not 
> so bad.  Every place where there used to be a transition from the 
> Claimed state to the Preempting state, there is now a transition into 
> Claimed/Retiring where no new jobs may be started.  The retirement 
> policy controls what happens from there and it's basically just one 
> additional expression: MaxJobRetirementTime.  There's a little extra 
> complexity under the hood handling such potentially desirable 
> things as 
> suspension during retirement and unretirement when a preempting claim 
> backs off, but one generally just wants it to do the right thing in 
> these circumstances and it does:)

So Claimed/Retiring state has a transition to Matched?

What happens if the job is retitered - gets sent the signal to vacate if it likes then a back off occurs - do you send a different signal? accept that it's started the motions of being vacated and so should be left to finish this?

Indeed when it transitions into the retiring state is the vacate signal sent immediately or at ((time to retirement end) - vacate timeout)

a state diagram would make it clearer but my assci art skills aren't up to much :¬)
> >Does a pending claim count for the purposes of continuing to 
> evaluate the cluster?
> >
> Having one job in a cluster waiting on a pending claim does not 
> interfere in any way with negotiation for resources to run the other 
> jobs in the cluster, if that is what you mean.  As far as the 
> negotiator 
> is concerned, the job was matched and is now out of the way.

does it count to the users effective priority? just wondering if this will cause a thrash if a users priority suddenly jumps due to the claims being taken over and above what they would have had if the matches had taken immediate effect...

This isn't a big deal to me since I don't really care much about user prio except where it stops machine requirements being effected
> I like it too.  Nothing about the claim retirement stuff 
> obviates it.  
> However, I'd be curious if graceful claim retirement satisfies your 
> needs or whether you still find something significant lacking.

the key thing is if user_prio considerations prevents it doing what I want it to :¬)

Thanks for your quick answers - irrespective of the above concerns it will still be useful functionality.


Gloucester Research Limited believes the information 
provided herein is reliable. While every care has been 
taken to ensure accuracy, the information is furnished 
to the recipients with no warranty as to the completeness 
and accuracy of its contents and on condition that any 
errors or omissions shall not be made the basis for any 
claim, demand or cause for action.