[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] When do machine RANK settings apply?



> > Looking at the NegotiatorLog it wants to preempt bchan's 
> jobs for mine 
> > but it can't because PREEMPTION_REQUIREMENTS are false. I 
> think what 
> > I'm observing here is that bchan's schedd holding on to the startd 
> > machine after a job finishes and just running the next job in her 
> > list. Why is my higher-ranking job not taking over this machine?
> 
> That is an issue - essentially if a user retains a claim to 
> the machine then they can keep sending lower priority jobs 
> too it. It seems the negotiator <annoyingly> decides that it 
> tried checking preemption based on the user priority being 
> higher, that said no so it won't bother checking if the 
> machine rank makes a difference...

I think, for my vanilla jobs in conjuntion with my very long job
retirement time then, I should be safe and perhaps better off saying:

PREEMPTION_REQUIREMENTS = (CurrentTime - EnteredCurrentState) > (1 * (60
* 60)) && MYRANK < TARGET.RANK

> Just to check if you  release all those jobs at the same time 
> (with only 2 machines to execute the three of them)  so that 
> a single negotiation cycle happens does the right allocation occur?

I'll have to test this. I'll need to get my two other users to submit
some dummy jobs.

> I was aware of the problem you describe on 6.6 (I very 
> occasionally have to execute a condor_vacate to force things 
> to realign if two users have identically tiered jobs but one 
> got a 'head start' and therefore holding onto it) but the 6.7 
> retirement in theory should have allowed me to enable user 
> preemption where a slight disparity exists coupled with max 
> job retirement to avoid thrashing.

Right. This is what's got me thinking that I'm better off allowing
preemption based on ranks using PREEMPTION_REQUIREMENTS. I wont thrash
because of MaxJobRetirementTime. Although, I've since tweaked my setting
back to allowing preemption and bchan has still got a firm hold on that
startd.

> All is not lost though - I think you may have forgotten about 
> your 2 day retirement time... the negotiator does recheck 
> when a "premption pending retirement" exists in case the 
> premting job goes away, this lets the retirement be withdrawn.
> 
> If the retirement is present but the schedd is still 
> accepting jobs then thats a BUG (didn't someone else mention 
> this a while back, did it get identified/resolved)...

I'm not sure what this means. Is there a way I can check for this?

> Any one at cs.wisc can see a why this might be happening 
> please do chip in here but I'm hitting a brick wall now.
> 
> Clearly more than one group would like to use condor in a 
> "Job then User" setting condor, for all it's vaunted 
> flexibility does not make this easy (jury still out on 
> possible) allow this. I see the reasons it doesn't since 
> considerable optimization of the startd/negotiator comms 
> overhead can be performed this way.
> However these optimizations make what we are attempting to do 
> excruciatingly unpleasant

Agreed. I'll talk a slow down and some gross inefficiency in the
negotiator to get to where I want to be today: job priority based
scheduling and not user based scheduling.
 

Ian