[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Condor, compared to Sun Grid Engine



Hello Miskell,

I crossed the same problem last year (using SGE) and I made a lot of 
performance records, probably useful for our CONDOR friends, because it is 
transposable. 
I plan to switch to CONDOR soon.

I used SuSE 8.2 on a mixture of dual hyperthreading XEONS and dual/quad 
Opterons : the length and the variance of the subjob duration if very 
significantly higher for the XEONS. I imposed to use 1 thread by blast 
subjob, but even then the best results were reached when allowing only 1 
slot/CPU. I suppose that the high variation (relative to the XEON's) is due 
to random allocations of two jobs on the two virtual cpu's of the same 
physical one (causing some storms into the pipelines...).
For blasts against the whole human genome, more than 512M/node is required (to 
avoid to fill up the swap...)

To improve the performance of the data processing (human genome is a quite a 
large dataset), I used rsync to cache this information on every dedicated 
computing node, accumulating the results on the local /tmp; the only NFS load 
is due to the return of the results (filtered and compressed on the node to 
reduce its size). 
----------
My conclusion : do not use hyperthreading (Kernel 2.6 may allow to better 
control the job allocation to virtual/physical CPU's ?), or for better and 
very stable behavior, invest into AMD-64 Operons and let the XEONS for other 
usages.

I avoid to post a PDF to a list : if somebody is interested, feel free to ask 
me the graphics.

	Cheers,

	Alain
----------

On Monday 19 April 2004 02:25, Miskell, Craig wrote:
> Hi,
> 	I'm evaluating software to use to control our compute farm, and
> I'm currently tossing up between Sun Grid Engine, and Condor.  My gut
> instinct is to like Condor more, probably due to it's roots - academic
> rather than commercial, and it just "feels" right.  Plus, among other
> things, SGE requires users have an account on each machine that they
> want to execute a job on, whereas condor will use 'nobody' if required -
> I much prefer condor's method as end users should never be able to even
> theoretically login to our compute farm nodes.
>
> But, SGE can do one particular thing that I'm not sure Condor can, and
> this is a relatively important requirement for what we want to do with
> our farm.  So, I thought I'd ask the list and see if there's some way I
> can poke Condor into doing it too:
>
> One of the tasks we will be using the compute farm for is Blast runs (a
> bioinformatics thing).  We have some nice dual-proc Xeons with
> hyperthreading which condor rightly interprets as 4 virtual
> machines/node.  Empirical testing has shown that our blast runs go best
> when a single job is told it can use 4 threads, which fully utilises the
> node.  However, convincing condor to support this is non-obvious (to
> me).  If I kick off 4 blast jobs, what I want to happen is condor to use
> 4 separate nodes.  Out of the box, what it'll do is use 4 vms on a
> single node, which will slow blast down to a crawl.  I *could* space out
> the starting of the runs so that the load average is up to 4 on each
> node before I kick off another job, but that will introduce quite some
> delay, and is a little bit crude for my liking.  SGE allows for
> consumable resources, so I could setup a virtual "Blast" resource with a
> count of 1 on each node, and then each blast job would "consume" that
> blast resource (a "licence", if you will).  Is there anyway to do the
> equivalent with Condor?  Perhaps advertise BLAST=1 for completely empty
> nodes and the blast job sets off a script which changes the
> advertisement for the node to BLAST=0, and require blast=1 for a blast
> job to start?  Would the change in advert get back to the master before
> the next job is dispatched to the same node?
>
> Am I even looking at this the right way, or do I need to shift my view
> completely?
>
> I appreciate any help you can provide,
>
> Thanks,
>
> Craig Miskell,
> Technical Support,
> AgResearch Invermay
> 03 489-9279
> I stopped and considered for a moment whether such a person
> would behave any differently  with his head cut right off, but
> then realized it would make a difference: it would allow  him to
> stand upright again.  	-- Anthony de Boer
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> Condor Support Information:
> http://www.cs.wisc.edu/condor/condor-support/
> To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
> unsubscribe condor-users <your_email_address>

-- 
        Bonne journée - Have a good day,

        Alain
+--------------------------------------------------------------------------
|  Dr Alain EMPAIN      Bioinformatique, Génétique Moléculaire B43,
|  Fac. Méd. Vétérinaire, Univ. de Liège, Sart-Tilman / B-4000 Liège  
|       Alain.EMPAIN@xxxxxxxxx
|       WORK:+32 4 366 3821 Fax: +32 4 366 4122   GSM:+32 497 701764
|       HOME:+32 85 512341  -- Rue des Martyrs,7  B-4550 Nandrin

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>