[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] When do machine RANK settings apply?



I thought some follow up would be nice. Everything is working very well
in our system now and there may be others in the future looking for a
way to nudge condor towards priority based, unfair resource scheduling.
This has been working as expected in our test system, under heavy load,
for the last 6 days.

Thanks to Matthew Hope, the Condor Team and others on this list for all
your help with this. What I'm about to show you may not seem like a
whole lot, but it was and undertaking to gather the information required
to push the system to conform to our scheduleing needs. That I was able
to do this is a testament to just how configurable Condor really is
underneath it all.

I'll start by saying that this tweak to Condor is entirely dependent on
the MaxJobRetirementTime feature. Making this policy changes without
excessivly long MaxJobRetirementTime's for your jobs will lead to much
thrashing. With the long MaxJobRetirementTime your vanilla jobs simple
show their states as "Retiring" more  frequently than they show "Busy"
-- there is (generally) always another job gunning for your machine if
you use this policy. We keep a minimum MaxJobRetirementTime of 2 days on
our vanilla jobs and a maximum of 2 weeks. Our users can adjust the
retirement length if they expect their jobs to be particularly long. Our
advice is to set this to a comfortable value. If jobs were longer than
this value you would suspect them of having gone into some errant,
inifite loop state.

In our startd policy:

RANK = (TARGET.JobPrio * 2880) + ((TARGET.JobStatus =?= 1) *
((CurrentTime - TARGET.EnteredCurrentStatus)/60))

This ranks jobs on machines based on their user-assigned priority. It
has the nice benefit of emphasizing jobs in the order users would like
them to run in on their schedd. The * 2880 spreads out the priorities
(JobPrio can be [-20:20]). The remainder of the expression automatically
increments the priority by 1 every minute if the job is idle. This is an
anti-starvation technique that helps lower priority jobs build up in
signifigance as they sit in the queue. This auto-increment, combined
with the multiplication factor of 2880 means that a job sitting idle for
48 hours has it's priority increased by a factor of 1. If it's JobPrio
was set to -10 then after 48 hours of idle time it is competiting for
resources as if it had JobPrio set to -9. The boost gets reset if the
job goes out of the idle state and then returns to the idle state.

In our negotiator policy:

PRIORITY_HALFLIFE = 1

We don't care about user priority. We would rather have user priority
adjust quickly to represent the current state of the system than slowly
to prevent thrashing.

PREEMPTION_REQUIREMENTS = (CurrentTime - EnteredCurrentState) > (1 *
$(HOUR)) && (TARGET.RANK - MY.RANK) > 2880

We've dropped any reference to user priorities in the
PREEMPTION_REQUIREMENTS setting. We had orginally set this to FALSE but
it left schedd's in the system with the ability to hang on to startd's
for a long time, even if newer, higher priority jobs entered the system.
The left side of the && was sufficient but because of our
anti-starvation RANK setting it mean that jobs from cluster A that were
running were being retired by jobs from cluster A that were idle for a
few seconds (because they immediately started to rank higher). The right
side of the && was added to adjust for this and we now only allow
preemption if the job is a full priority level higher than the running
job. This has worked out very well. Remember that we don't preempt
immediately in our system. We simply send jobs into a long retirement
phase. This just ensures that the startd returns the resource to the
negoiator for reassignment when a higher priority job enters the system
instead of continually accepting jobs from one schedd.

This method works very well in our setting where all of our jobs run as
vanilla jobs. Job execution and scheduling is consistent and
predictable.

- Ian



> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
> Sent: January 5, 2005 3:58 PM
> To: Matt Hope
> Cc: Condor-Users Mail List
> Subject: RE: [Condor-users] When do machine RANK settings apply?
> 
> > > Looking at the NegotiatorLog it wants to preempt bchan's
> > jobs for mine
> > > but it can't because PREEMPTION_REQUIREMENTS are false. I
> > think what
> > > I'm observing here is that bchan's schedd holding on to
> the startd
> > > machine after a job finishes and just running the next job in her
> > > list. Why is my higher-ranking job not taking over this machine?
> > 
> > That is an issue - essentially if a user retains a claim to the
> > machine then they can keep sending lower priority jobs too it. It 
> > seems the negotiator <annoyingly> decides that it tried checking 
> > preemption based on the user priority being higher, that 
> said no so it
> > won't bother checking if the machine rank makes a difference...
> 
> I think, for my vanilla jobs in conjuntion with my very long
> job retirement time then, I should be safe and perhaps better 
> off saying:
> 
> PREEMPTION_REQUIREMENTS = (CurrentTime - EnteredCurrentState)
> > (1 * (60
> * 60)) && MYRANK < TARGET.RANK
> 
> > Just to check if you  release all those jobs at the same time (with
> > only 2 machines to execute the three of them)  so that a single 
> > negotiation cycle happens does the right allocation occur?
> 
> I'll have to test this. I'll need to get my two other users
> to submit some dummy jobs.
> 
> > I was aware of the problem you describe on 6.6 (I very occasionally
> > have to execute a condor_vacate to force things to realign if two 
> > users have identically tiered jobs but one got a 'head start' and 
> > therefore holding onto it) but the 6.7 retirement in theory should 
> > have allowed me to enable user preemption where a slight disparity 
> > exists coupled with max job retirement to avoid thrashing.
> 
> Right. This is what's got me thinking that I'm better off
> allowing preemption based on ranks using 
> PREEMPTION_REQUIREMENTS. I wont thrash because of 
> MaxJobRetirementTime. Although, I've since tweaked my setting 
> back to allowing preemption and bchan has still got a firm 
> hold on that startd.
> 
> > All is not lost though - I think you may have forgotten
> about your 2
> > day retirement time... the negotiator does recheck when a
> "premption
> > pending retirement" exists in case the premting job goes away, this
> > lets the retirement be withdrawn.
> > 
> > If the retirement is present but the schedd is still accepting jobs
> > then thats a BUG (didn't someone else mention this a while 
> back, did
> > it get identified/resolved)...
> 
> I'm not sure what this means. Is there a way I can check for this?
> 
> > Any one at cs.wisc can see a why this might be happening please do
> > chip in here but I'm hitting a brick wall now.
> > 
> > Clearly more than one group would like to use condor in a "Job then
> > User" setting condor, for all it's vaunted flexibility does 
> not make
> > this easy (jury still out on
> > possible) allow this. I see the reasons it doesn't since
> considerable
> > optimization of the startd/negotiator comms overhead can be
> performed
> > this way.
> > However these optimizations make what we are attempting to do
> > excruciatingly unpleasant
> 
> Agreed. I'll talk a slow down and some gross inefficiency in
> the negotiator to get to where I want to be today: job 
> priority based scheduling and not user based scheduling.
>  
> 
> Ian
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx 
> http://lists.cs.wisc.edu/mailman/listinfo/condor-users
>