[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Deployment Recommendations




On Apr 23, 2009, at 8:01 AM, James Osborne wrote:


Dear All

My name is James Osborne and I am the Condor Project Manger at Cardiff University in the UK. Now that summer is approaching, and I have some nice new virtualization infrastructure coming on stream, I am in the process of virtualizing our Condor infrastructure. I already have a virtual submit machine which works very well with surprisingly low overhead (I couldn't push it harder than about 4% cpu usage with 000s of 15 minute jobs in the queue). The virtualization infrastructure will soon be a load-balanced pair of 3GHz dual-socket quad-core machines with 32GB of RAM each with multiple redundant connections into FC storage.

I seem to remember hearing that a good 'rule of thumb' was to have no more than 2000 execute nodes reporting to a single central manager.

1) Is that still the case ?

A few years ago. Today, one of our single pool has nearly 9k execute slots. If your 9000 slots are Windows, you'll probably want to make sure to use TCP updates to the collector.


2) Has anybody pushed a single central manager to about 9000 execute nodes ?

3) Does it make more sense to deploy 4-5 central managers instead and use flocking ?

It does help in some instances: logically separating machines by administrative domain or other, but it'll also make your environment more complicated. We have many cores at Purdue, most of which are in 3 pools, but with several other smaller, flocked pools.


4) If so, would you for example use one central manager per core network router even if that increased the number of managers to 8 or more ?

I try and group them: a pool for all sorts of distributed machines around campus
   a pool of HPC cluster nodes with external WAN connectivity
   and a pool of cluster nodes that are on private IP space.

5) Has anybody tried to flock jobs to 8 or more central managers ?

 Yep.


I can already see how I can set execute nodes to report to different central managers in my Condor distribution scripts.

I look forwards to hearing from those of you with big pools...

Thanks in advance.  Best regards

James_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/