[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] GCB Performance



I sent this only to Chris. So, I am posting this to the group.

Log files say that each job runs about 10sec. In order not to throttle submit machine with too many processes running and too many files in transit, Condor, by default, puts 2 second delay between job invocations. This is what the manual says:

"This integer-valued macro--JOB_START_DELAY--works together with the JOB_START_COUNT macro to throttle job starts. The condor_ schedd daemon starts $(JOB_START_COUNT) jobs at a time, then delays for $(JOB_START_DELAY) seconds before starting the next set of jobs. This delay prevents a sudden, large load on the submit machine as it spawns many condor_ shadow daemons simultaneously, and it prevents having to deal with their start up activity all at once. The resulting job start rate averages as fast as ($(JOB_START_COUNT)/$(JOB_START_DELAY)) jobs/second. This configuration variable is also used during the graceful shutdown of the condor_ schedd daemon. During graceful shutdown, this macro determines the wait time in between requesting each condor_ shadow daemon to gracefully shut down. It is defined in terms of seconds and defaults to 2. Setting this macro to a lower value is not advised, as it can overwhelm the condor_ schedd daemon."

With this default configuration, your job finishes before Condor launches all matched jobs (making machines available for jobs that are waiting for next match). Therefore, you just need 5 ~ 6 VMs to maximize performance in your case. Adding more machines contribute nothing and that's why you get basically the same performance with 20 VMs and 40VMs.


Chris Miles wrote:
Ok. The condor pool is made up off exactly the same spec machines. Its an IBM Cluster.

I firstly ran a test to see how long my 50 jobs would take on just one machine (2 VMs)
and it took 5m 11s
I then loaded up 10 nodes -- Jobs took 2m 8s
I then loaded up 20 nodes -- Jobs took 2m 22s
Find attached are the logs for the submission machine from the 10 and 20 node tests.
thanks
Chris