[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [condor-users] RE: clarification required please

> -----Original Message-----
> From: owner-condor-users@xxxxxxxxxxx
> [mailto:owner-condor-users@xxxxxxxxxxx]On Behalf Of Mark Silberstein
> Sent: 11 May 2004 05:19
> To: condor-users@xxxxxxxxxxx
> Subject: Re: [condor-users] RE: clarification required please
> > B) Preemption
> > from the supplied config file
> > 
> > ##  The negotiator will not preempt a job running on a given machine
> > ##  unless the PREEMPTION_REQUIREMENTS expression evaluates to true
> > ##  and the owner of the idle job has a better priority 
> than the owner
> > ##  of the running job.  This expression defaults to true.
> > UWCS_PREEMPTION_REQUIREMENTS = $(StateTimer) > (1 * 
> $(HOUR)) && RemoteUserPrio > SubmittorPrio * 1.2
> > 
> > does this means that, in addition to this 
> PREEMPTION_REQUIREMENTS evaluating to true the user prio must 
> be better or that this particular expression causes this.
> This expression is evaluated by the Negotiator, to decide whether to
> preempt your job or not. So the documentation is misleading, i.e. this
> expression defines the overall behavior. Thus, if you remove 
> (RemoteUserPrio>SubmittorPrio*1.2), then there will be no 
> user priority
> consideration for preemption at all. 

This seems to disagree with Alain's answer - see previous post.
Perhaps I am missing something though.

> > C) Vacation
> > want_vacate = False
> > 
> > there is no definition for want_vacate_vanilla 
> > 
> > vanilla jobs do not immediately go to the killing state 
> > they remain in the preempting state till the timeout expires 
> > (we were using the default UWCS value for KILL as I thought 
> > it would not matter)

> I guess you should put WANT_SUSPEND=FALSE. It might be the 
> case that you
> actually don't see the real state of the job. 
> Then your  WANT_VACATE=FALSE is in place, and it goes 
> directly to KILL. 

Want_suspend _is_ false. it is not spending time in the Claimed/Suspended state it is (at least as far as condor_status goes) spending 10 mins in the Preempting/Vacating state.
Again the state transition diagram is pretty clear that want_suspend has not effect on the vacation behaviour...

This is not a big deal (since the kill evaluation is sufficiently quick that I don't see any dicernible timedelay in the additional transition but you could be losing a lot of throughput where you have mostly vanilla universe and are using preemption (though that is a problem in and of itself).

Condor is a wonderfully powerful tool but it would be nice if there was a better explanation somewhere of how to disable the features unecessary to pools in more controlled environments (where authentication is less needed, machines are more homogenous and the driving queueing mechanism is not user_prio but job specific... such as the oft requested TIER pattern)

For example, The default UWCS startd config used on the windows install has PREEMPTION_REQUIREMENTS = $(StateTimer) > (1 * $(HOUR)) && RemoteUserPrio > SubmittorPrio * 1.2 
For jobs lasting a few hours this behaves in a degenerate fashion(1) killing the old jobs rather than the new ones wasting hours of compute time.

A reasonable set of defaults for this universe would be useful.


(1) for cases where the vacate sigterm is not appropriately dealt with, which is prob most when first using condor on windows

Gloucester Research Limited believes the information 
provided herein is reliable. While every care has been 
taken to ensure accuracy, the information is furnished 
to the recipients with no warranty as to the completeness 
and accuracy of its contents and on condition that any 
errors or omissions shall not be made the basis for any 
claim, demand or cause for action.

Condor Support Information:
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>