This is what I* would like
1) Machine A Claimed by (at the time) the best job for it.
2) New job added to queue (or released / qedited etc. etc.)
3) This job evaluates to a higher rank on the machine A that the current job but preemption_requirements evaluates to false.
4) When the job finishes the machine causes the release of the current claim and behaves like a fresh machine
the behaviour I infer from the mail below is
1) Machine A Claimed by (at the time) the best job for it. 2) New job added to queue (or released / qedited etc. etc.) 3) This job evaluates to a higher rank on the machine A that the current job
4a) preemption_requirements evaluates true.
5a) the currently running job gets an additional amount of time to complete before vacation
This is an improvement but does not really provide the desired control I list above - since I do not necessarilly know in advance how long is reasonable to give to a job.
I suppose I can simulate the above behaviour by pushing this retirement timout very high but will this lead to issues further down the line such as:
1) Another machine becomes free but the pending job cannot use it
2) Another job of even better rank cannot take the pending claim off the existing one.
3) Management and transistions of state is already complex -this seems to muddy it further.
Does a pending claim count for the purposes of continuing to evaluate the cluster?
I like C because
a) the admin can tune it. b) the behaviour is exactly as most peple would expect looking at the queue. c) the _current_ state is always used to determine the next allocated job rather than any previous state.