[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Ditributed computing performance



I am not certain where your speedup is going to come from.
HTC is about getting more jobs finished in a certain amount of time
rather than getting a single job to finish quicker.

So if I have 1,000 jobs that take an hour each, I could get a fair amount
of speedup by spreading them out over 4 machines rather than running
them on a single node.

To get speedup for your single job you will have to "cut it down"
somehow, say by reducing the data it works over, or reducing its 
iterations; the remaining data being analysed across the other 3 jobs.

HTC (and Condor) works best for parameter sweep, monte carlo and similar
so called "serial" jobs (I dislike that term since the jobs can typically
be run in any order - I prefer the term "independent"). Jobs that will need to
regularly swap information and synchronise ("parallel" jobs such as MPI) are a better
fit for the HPC, single cluster model. Condor can be configured to do
MPI work, but it is best to get an idea of how it works using the HTC model first.

I hope this is of some help

Cheers

JK


> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx]On Behalf Of Leo Cristobal C.
> Ambolode II
> Sent: Wednesday, September 19, 2007 7:02 AM
> To: condor-users@xxxxxxxxxxx
> Subject: [Condor-users] Ditributed computing performance
> 
> 
> Hi condor-users and developers,
> 
> I have a simple condor pool consists of 4 Linux machines. I 
> am about to
> evaluate this cluster of computers. So far, I've been able to test its
> speedup, I simulate a single long-running job (it takes about 
> a day to a
> week to finish the jobs). I increase the number of machines (say from
> single machine up to using 4 machines) used in simulating the 
> program. SO
> far so good. I am using programs/applications related in our 
> field which
> is High Energy Physics; we used SimTools which in turn used ROOT and
> GEANT4 (URL's are www.root.cern.ch and www.geant4.cern.ch, 
> respectively).
> 
> I've read "Distributed and Parallel Computing" book by 
> Al-Rashini?, I'm
> sorry if I did not get the correct title or the correct 
> spelling of the
> author. It talks about Response Time, Throughput, Network..., 
> etc. Have
> anyone tried the evaluation I am going to make? What are the 
> appropriate
> performance parameters that I am going to investigate and how 
> should it be
> done? I only have 4 machines. At first, I am only interested 
> with speedup
> and more on parallel computing, but since my study is on distributed
> computing and is somehow differs from parallel computing, 
> then I have to
> investigate more to justify distributed HTC.
> 
> I thank you in advance. If you have further questions 
> regarding the nature
> of my study, feel free to ask me.
> 
> Sincerely,
> 
> Leo
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to 
> condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/
>