[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Deployment Recommendations

Dear All

My name is James Osborne and I am the Condor Project Manger at Cardiff University in the UK.  Now that summer is approaching, and I have some nice new virtualization infrastructure coming on stream, I am in the process of virtualizing our Condor infrastructure.  I already have a virtual submit machine which works very well with surprisingly low overhead (I couldn't push it harder than about 4% cpu usage with 000s of 15 minute jobs in the queue).  The virtualization infrastructure will soon be a load-balanced pair of 3GHz dual-socket quad-core machines with 32GB of RAM each with multiple redundant connections into FC storage.

I seem to remember hearing that a good 'rule of thumb' was to have no more than 2000 execute nodes reporting to a single central manager.

1) Is that still the case ?  

2) Has anybody pushed a single central manager to about 9000 execute nodes ?

3) Does it make more sense to deploy 4-5 central managers instead and use flocking ?

4) If so, would you for example use one central manager per core network router even if that increased the number of managers to 8 or more ?

5) Has anybody tried to flock jobs to 8 or more central managers ?

I can already see how I can set execute nodes to report to different central managers in my Condor distribution scripts.  

I look forwards to hearing from those of you with big pools...

Thanks in advance.  Best regards