[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Defusing a Condor bomb?



On Wed, Oct 26, 2005 at 04:52:12PM +0100, Matt Hope wrote:
> On 10/26/05, Chris Green <greenc@xxxxxxxx> wrote:
> > Hi,
> >
> > So I have a cluster of about 80 nodes, with file access for users spread
> > over a handful of servers. I monitor the load on each server and don't let
> > a user's job start if his server's load is too high (everyone uses a
> > central submission script so requirements get added to the .cmd file). The
> > problem is, if a user submits a bunch of jobs to an almost empty queue,
> > they could all start running (and bring the server to its knees) before
> > the load monitor notices. Is there some way to throttle the frequency at
> > which jobs from the same user start executing to prevent this happening?
> 
> same user no (I think). same schedd yes...
> 
> http://condor.optena.com/display/CONDOR/JOB_START_DELAY
> 

There will be a new attribute in the next 6.6 release (and so
in the next 6.7 release as well) that lets JOB_START_DELAY be on a 
per-job basis. It's not perfect, because you're not guaranteed 
any sort of order, but if you know a particular job is going to be big
you can stall the schedd before it tries the next one. It's called
NextJobStartDelay.

> note however that the jobs will have been matched so you would have to
> then prevent them from starting - this would be rather inefficient.
> 
> I do not believe there is any easy way of only allowing x jobs to be
> negotiated and accepted at a time (which is what you would need to
> do).

There is an option to the negotiator that controls how long to spend per
user in a negotation cycle - but it's only time, not not number of jobs.
If you set that small, and with a long enough NEGOTIATOR_INTERVAL, you
could effectively only match X number of jobs per cycle.

-Erik