[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] How to have schedd drop claim after each job

Matt Hope wrote:

so to clarify possible reasons a job dies

state + pre condition
-> behaviour
: post condition

admin/user does condor_rm -> no time out (I hope)
: is machine still claimed?

With condor_rm, the job is removed immediately. The claim is not affected by the operation.

admin/user does condor_vacate -> no time out (I hope) : is machine still claimed?

condor_vacate has two different modes: -fast and -graceful. -fast kills the job immediately. -graceful lets the normal startd policy take effect, so if (and only if) retirement time was promised and accepted, it will be given. The machine is unclaimed after the job vacates.

PREEMPT evaluates to true -> time out?
: I read this as machine unclaimed

If retirement time was promised and accepted, it is given. This implies that for machines where immediate PREEMPT is desired, you would not normally want to promise any retiremen time.

user prio is higher and PREEMPTION_REQUIREMENTS evaluates to true
-> retirement timeout
: I read this as machine unclaimed

Yes. If the machine promised the job some retirement time and the job accepted it, then it certainly applies in this case. Once the job retires, the machine is claimed by the preempting job.

Graceful shutdown requested -> retirement timeout is used instead of the normal gracefultime out?
: I read this as machine unclaimed

Actually, there are three different shutdown modes: -fast, -graceful, and -peaceful. -fast causes immediate shutdown. -graceful will obey the retirement policy except it can timeout earlier due to GRACEFUL_SHUTDOWN_TIMEOUT, basically giving you control over how this case works. -peaceful effectively grants infinite retirement time and bypasses the graceful shutdown timeout. These all apply to condor_restart as well.

Machine ranks a job higher than an existing one - relative user prio immaterial -> timeout : Unclaimed

It is this last one that is critical to me - if it only works if the user prio is higher then it's not much use to preform job rather than user allocation policies...

Exactly. Since the retirement time is a part of the _machine_ policy rather than the negotiator policy, it applies even in case of startd rank preemption, unlike PREEMPTION_REQUIREMENTS. Whatever retirement time you configure (could be specific to the types of jobs the machines prefer), it applies in this case as well. Once the job vacates, the new claim replaces the old one.