[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Request for Ideas/Plans: Designing a Large Condor Pool



Dear Condor User Community:

We are in the process of setting up a Condor pool to initially include all lab machines (400-1000 machines) on campus, though later we plan to add a few of our clusters. While we currently run Condor on some of our smaller clusters, we suspect that the layout for this larger pool will be different than a standard Condor pool.

For this campus pool, we want one entry point for users to submit jobs. Since the pool will have tens of thousands of jobs in queue, with several hundreds running simultaneously, we know that we will likely overload one schedd along with the other daemons.

Does anyone have any design plans that outline how one might set up a pool with a single point of entry, with multiple daemons to spread out the load and provide some redundancy? I've looked in the manual for examples of large deployments, but cannot find any. Am I missing something? If you wouldn't mind sharing your pool layout, I think that this would be useful to many Condor users especially if your pool is not a typical pool.

--
Jess Cannata
Advanced Research Computing
Georgetown University