[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] machine for central-manager



On Wed, Apr 21, 2010 at 11:50 AM, Santanu Das <santanu@xxxxxxxxxxxxxxxxx> wrote:
How powerful a machine (CPU-core, memory etc.) should be to run condor central-manager for a ~400-core cluster? I'm interested in vitalization, so I wanna get a rough idea about the hardware we should go for to accommodate all the services/virtual machines to run nicely.

Is this machine running condor_collector and condor_negotiator? Or is it also running a condor_schedd daemon? If it's just condor_collector and condor_negotiator: not very powerful at all really. I kept a 500 slot farm alive and well with an old 2-CPU (single core) Xeon box. It had 4GB of RAM IIRC. Ran 32-bit CentOS4 on it. It did have Gb fiber-based ethernet. It kept up just fine. The only reason it got swapped out for new hardware was the hardware was well out of warranty, which apparently makes IT department heads nervous.

Obviously this older hardware wouldn't work well with virtualization. The CPUs aren't virtualization-friendly. But that does say you don't need a lot of juice if you're just running condor_collector and condor_negotiator.

Worth noting that my farm configuration was incredibly sensitive to negotiation cycle times at the time. I was seeing under 2 minute negotiation cycles with 1 schedd in the system holding 40k jobs (and a fairly heterogeneous job distribution so not a lot of re-clustering happening in the negotiator).
 
Also, is there any recommendation on vitalization platform as far as condor is concerned? Is VMware ESXi a good choice??

I currently have a few farms with central managers virtualized on Xen. The VMs are all running CentOS 5.somethingorother. It's not as stable as I was hoping for. If there's any appreciable amount of disk latency between the VM manager program and the disk where the image is based Xen crashes and takes the images down with it. They restart, but it's annoying. We were hosting the VM images on our NAS to make for easy, nearly instant migration of VMs from machine to machine in case of failure, but we get big latency spikes on our NAS thanks to load from our farm and the spikes would take out Xen. Had to move 'em to local disk for now which is very much less useful.

Hope that helps.

- Ian