Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Benchmarking condor

Date: Thu, 30 Jun 2005 11:48:26 +0100
From: Matt Hope <matthew.hope@xxxxxxxxx>
Subject: Re: [Condor-users] Benchmarking condor

On 6/29/05, Juan Ignacio Sánchez Lara <juanignaciosl@xxxxxxxxx> wrote:
> Hello,
> 
> what do you do when you want to measure the throughput and speed-up of your
> Condor cluster? I'm looking for standard benchmarks (instead of running
> multiple instances of a home-made software), but almost everything is only
> MPI-based (and I'd like to measure not only MPI performance but also
> standard).
> 
> Thank you very much:
> 
> PD: Matthew (maybe you're interested), I'm finally going to implement a web
> interface to the SOAP condor API, so I promise in the next days/weeks I can
> report about NuSOAP

Condor does its own benchmarking which is the quickest for you to get.

Take a look at the sum of:
condor_status -format "vm%d@" VirtualMachineId -format "%s " Machine
-format "%d " KFlops -format "%d\n" Mips

Note that this may well report machines which have startd's but do not
run jobs, therefore they should be excluded.

Also note in general (and specifically to condor) that benchmarking
something so complex as a cluster is neither an exact science nor an
easy one.

The KFlops/MIPS values are about as useful as BogoMIPS et al. i.e.
useful only as a vague indicator of relative raw CPU performance.

The best way to benchmark your pool is to take the suite of
applications you run on it and run each one in some well defined,
repeatable and close to reality mode then see how long the jobs take
to finish.

**Note the take to finish bit.**

This is inherently subjective. For some jobs the data returned becomes
useful as it trickles in, thus the total wall clock time plus the
overhead of negotiation, transmission etc. for each job could be
summed and for a relatively reasonable basis for the throughput of the
farm in isolation.
To make it a better test you should have no other jobs on the farm at
the time though this may be completely unrealistic.

If the data is useful only when the last job is finished then timing
till the end of the last job is more meaningful but less useful for
comparisons since it will be *extremely* variable with respect to some
key limits (n jobs, m machines if n is not significantly bigger /
smaller than m then the times will change in big steps. essentially n
mod m value changing will have a significant effect on the reported
value even if the throughput itself doesn't change significantly)

Another key factor is if the machines (and indeed jobs) in the pool
are very non uniform then this will affect the comparability unless
you tune things extremely finely (which may work for the benchmark but
not too well in real usage)

Essentially you may think you are asking a reasonable question but
such simple questions normally spawn 10 more tricky ones repeat ad
nauseum.

The quick way to get a feel for a stable pool's power is to evaluate
the performance of each machine on a particular set of tasks (where
the task is representative of something you do regularly).

Evaluate the number of such tasks each machine could pump though in
some significant time period (hour/day/week etc).

Work out the rough split in terms of tasks in your current/projected usage.

Do the maths and you have a rough guide to the throughput your farm
can achieve. If you seem to get significantly lower throughput than
this number suggests either: your assumptions regarding the task
splits or their closeness to real load were invalid; or some aspect(s)
of the farm such as scheduling/checkpointing/farm errors are sucking
away useful time.

This is a useful thing to spot since it means you can target your
investigations into this and measure whether any changes (removing a
destructive machine for example) actually lead to a quantitative
improvement.

Short answer (after a very long one) is not easily if you want
anything other than trivial evaluations on trivial tasks.

Matt

Follow-Ups:
- Re: [Condor-users] Benchmarking condor
  - From: Erik Paulson
- Re: [Condor-users] Benchmarking condor
  - From: Juan Ignacio Sánchez Lara

References:
- [Condor-users] Benchmarking condor
  - From: Juan Ignacio Sánchez Lara

Prev by Date: RE: AW: [Condor-users] Condor 6.6.9 on XP machines (with SP 2installed)
Next by Date: Re: [Condor-users] mini-survey - windows users - what s/w do your users use?
Previous by thread: [Condor-users] Benchmarking condor
Next by thread: Re: [Condor-users] Benchmarking condor
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Benchmarking condor