[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor Learning Curve



On Tue, May 4, 2010 at 7:03 PM, Mag Gam <magawake@xxxxxxxxx> wrote:
Nice to hear some feedback from veterans. My number 1 problem is
supporting the users.  I have written a suit of scripts to help me
out. Currently, I manage close to 2000 servers without any issues.

On that scale I think you'd be a glutton for punishment *not* wrapping up access to Condor behind some kind of interface. Either your own solution or something like Cycle Server. That gives you means to stage upgrades (splitting pools and moving jobs by user/type to newer versions of Condor as user's qualify their jobs on the technology, this approach works well for hardware too) and to enforce policies (like you always need a specific ClassAd KVP in your jobs for example).

For managing the configs on that scale are you hosting everything on shared disk? Do you have cascading configuration files? So every machine reads the same top-level config file and then there are branches based on attributes on the machine (OpSys, Arch, custom attributes, etc.)? That's the real key to success if you don't employ a management tool.
 
I would like to write a Web gui for it and release it with Apache
license but thats a couple of months away. :-)

Sounds cool. Keep us posted.

- Ian