[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] how to limit no of running jobs ?





--On 05 June 2006 10:54 +0100 Matt Hope <matthew.hope@xxxxxxxxx> wrote:

On 6/5/06, Dr Ian C. Smith <i.c.smith@xxxxxxxxxxxxxxx> wrote:
Hi,

Very simple question. We have a user who
wants to run several thousand Condor jobs but from
a sysadmin point of view I'd prefer it if only, say,
50 ran at any one time. Is there a way of putting
a limit on the number of concurrently running jobs
in the submission script or elsewhere ?

No general case solution works. the easiest solution that certainly
works (I use it as do several others on this list) is:

Allocate a specific schedd somewhere for these jobs, set the
MAX_JOBS_RUNNING on this schedd to the number you will allow, have all
the restricted jobs run from this schedd machine on their own.

Other people have done similar partition jobs via accounting goups and
specific users.

An alternate is to have a monitoring job keep an eye on things an
hold/release jobs as required (this is very fleible but a pain to
maintain since you have to avoid it's polling screwing up your farm)

If you only ever submit one cluster of these jobs at once then  you
could pretend it was a dag an use the dagmans functionality to
restrict the max active jobs. This won't work if you have more jobs
being submitted every so often though...

This should probably be a FAQ entry since it is a reasonable request
many people have.

Matt

Hi Matt,

Thanks for the speedy reply. I always thought this was part of the
Condor functionality but apparently not. The reason I ask is
that I kind see two different groups of our Condor users developing.
The first run small numbers of long (as in weeks) jobs under DAGMan,
the second will be running large numbers of short (~ 30 mins) jobs without
DAGMan.
I'm worried that jobs from the first group will be edged out by the
second - is this likely to be the case ? Should I in some way increase
the priority of the long jobs ?

I've never quite understood how Condor shares resources between users.
For schedulers like Sun Grid Engine there are variety
of policies which can be employed.

cheers,

-ian.