[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] How to have schedd drop claim after each job





Matt Hope wrote:

This is what I* would like

1) Machine A Claimed by (at the time) the best job for it.
2) New job added to queue (or released / qedited etc. etc.)
3) This job evaluates to a higher rank on the machine A that the current job but preemption_requirements evaluates to false.
4) When the job finishes the machine causes the release of the current claim and behaves like a fresh machine



Yes. This is exactly how I wanted it to be too:) Let's just say this behavior could still be implemented some time in the future, but the graceful-claim-retirement feature provides many of the same benefits today and even has some advantages. Consider:


1) preemption_requirements only applies to preemption due to user priority, not machine rank preemption, the PREEMPT expression, or graceful restart/shutdown. There are good reasons for that, but we wanted a simple policy knob that would control all cases where Condor kills jobs, so an admin can confidently say, "Machine X will never kill jobs of type Y within bound Z." Of course, the admin can always override this later or the power can go down or whatever, but the machine policy for job-killing under normal circumstances is expressable and jobs can form ClassAd requirements or rank expressions based upon it if they care to.

2) Once a resource claim is established, the throughput of that claim is independent of load or accessibility of the negotiator/collector. It is difficult to achieve this if you delay matchmaking until the moment in time between one job and the next. You mentioned several ways we also considered addressing this, all at the cost of considerable complexity when you get down to the messy details--adding state to the matchmaker etc.

the behaviour I infer from the mail below is

1) Machine A Claimed by (at the time) the best job for it.
2) New job added to queue (or released / qedited etc. etc.)
3) This job evaluates to a higher rank on the machine A that the current job

4a) preemption_requirements evaluates true.
5a) the currently running job gets an additional amount of time to complete before vacation



Yes, except the machine policy does not exactly specify how much _additional_ time to give to the job before vacation. It expresses the _maximum_ time that will ever be given to the job before vacation. If the job has already run (uninterrupted) for 6 days and the maximum retirement time is 2 days, then it will be vacated immediately. The job is also free to provide its own retirement time that is lower than the maximum allowed by the machine. For example, nice-user jobs or similar backfill types of jobs can jump in and jump out without causing anybody to wait for them to finish, even if the machine policy allowed more time.



This is an improvement but does not really provide the desired control I list above - since I do not necessarilly know in advance how long is reasonable to give to a job.


If you don't want jobs to be interrupted, you have to have some idea how long is reasonable. In a very controlled environment, a month might be reasonable. Or maybe some types of jobs or users should be granted more than others. Whatever makes sense.


I suppose I can simulate the above behaviour by pushing this retirement timout very high but will this lead to issues further down the line such as:

1) Another machine becomes free but the pending job cannot use it


When the schedd is waiting for a claim to be ready for use, it will periodically go back to the negotiator to see if it can find something better. There's room for making this mechanism more sophisticated, but it works today and you can tune it if necessary.


2) Another job of even better rank cannot take the pending claim off the existing one.


There's no reason why it can't. (But am I about to go look at the code and double-check this is handled correctly?!)


3) Management and transistions of state is already complex -this seems to muddy it further.


More features generally amounts to more complexity:( However, it's not so bad. Every place where there used to be a transition from the Claimed state to the Preempting state, there is now a transition into Claimed/Retiring where no new jobs may be started. The retirement policy controls what happens from there and it's basically just one additional expression: MaxJobRetirementTime. There's a little extra complexity under the hood handling such potentially desirable things as suspension during retirement and unretirement when a preempting claim backs off, but one generally just wants it to do the right thing in these circumstances and it does:)


Does a pending claim count for the purposes of continuing to evaluate the cluster?


Having one job in a cluster waiting on a pending claim does not interfere in any way with negotiation for resources to run the other jobs in the cluster, if that is what you mean. As far as the negotiator is concerned, the job was matched and is now out of the way.


I like C because

a) the admin can tune it.
b) the behaviour is exactly as most peple would expect looking at the queue.
c) the _current_ state is always used to determine the next allocated job rather than any previous state.


I like it too. Nothing about the claim retirement stuff obviates it. However, I'd be curious if graceful claim retirement satisfies your needs or whether you still find something significant lacking.


--Dan