[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] reducing job start time



On Apr 4, 2008, at 4:25 AM, Jos Houtman wrote:

I am wondering if there are ways to improving the job start time (the
time between submit and actual startup).
My plan is to use condor to run queue-processors, which are submitted by
a manager that makes sure we keep up with the queue. The manager also
runs in the cluster.

Because we want to keep queue processing times low, a worker normally
only works on a few queue items.
At the moment this leads to an average runtime of 2 seconds for a
worker.
This makes anticipating and scheduling workers for the manager harder
because the average time from submit to running a worker is about 17
seconds.

I was wondering if the job start time could be reduced even more?
I already lowered the NEGOTIATER_INTERVAL to 15 seconds and tried
running condor_reschedule after a submit.
The cluster will comprise of about 20 Quad-core nodes, but any solutions
should also scale to a tenfold of this.


Condor isn't designed to run many 2-second jobs efficiently. But there are a couple things you can try to reduce the queue time of your jobs:

* Change NEGOTIATOR_CYCLE_DELAY in the config file. This sets the minimum time between negotiation cycles and defaults to 20 seconds.

* It can take a while for the negotiator to match a job with a machine. But once the job completes, the schedd can immediately run another job on the same machine if more jobs are available. So if you can submit your jobs in large groups, they will execute faster.

* Take a look at Condor's Computing On Demand (COD). It's a way to give short jobs quick access to your Condor machines. Section 4.3 of the Condor 7.0 manual has more information:
http://www.cs.wisc.edu/condor/manual/v7.0/4_3Computing_On.html

Thanks and regards,
Jaime Frey
UW-Madison Condor Team