Thank you so much for your very thoughtful response. My comments are included below:
> These are interesting ideas. I would make the following comments:
> - Using a RDBMS is (probably) overkill unless you've got a really huge
> set of hosts. Database systems really come into their own when you need
> to be able to make a (large) number of changes to a datastore whilst
> maintaining transactional consistency. Certainly in the 400-500 node
> pool that I maintain, updating flat files and running `condor_reconfig
> -all` is sufficient.
I definitely agree that a relational database is overkill when your configuration files are infrequently updated and well-tuned to your needs.
Given the very flexible and highly configurable nature of Condor system, there are many wonderful things one can do with Condor by dynamically adding new resource attributes, or dynamically changing Condor’s policy expressions. In large companies, machines are typically shared across different groups, and each group owns their machines and hence has some unique set of policies and settings. Sorting out and remembering which local config files contain which policies can lead to management headaches. A central database can help with that.
A database can open up some possibilities that you may not have considered before. Let's say that a central database makes it easy to change Condor's policy expressions (START, PREEMPT, RANK, etc) for arbitrary groups of machines. Now, let's also say that your boss walks in and wants 50 machines for his exclusive use RIGHT NOW. Problem solved: it's easy to just change the START _expression_ for 50 machines in your central database.
To take this a step further, what if Condor's policy expressions could change _automatically_ in response to some event (or events)? To give an example, you could set up a “rule” to change the policies of a pool when it is highly loaded. Another “rule” could exist to change pool policies when certain throughput requirements aren't being met by an important group in your company. This is just the tip of the iceberg.
In order to respond like this, however, we need to capture more information into the database. It needs to essentially contain the state of the entire pool - all the machine ads, all the job ads, historical job performance, information on running daemons, and more. Handling all this data demands a powerful database. But once all this information is available and centrally located, it becomes possible to analyze, visualize, and even troubleshoot Condor.
I have spent quite a bit of time and energy to envision and then develop a way to automate and manage Condor. The concepts above are central to ongoing work at Optena. I welcome further your discussion and exchange of ideas.
Founder & CTO
Direct : +1.408.321.9006
Fax : +1.408.904.5992
This electronic transmission (and any attached documents) contains information from Optena Corporation and is for the sole use of the individual or entity it is addressed to. If you receive this message in error, please notify me and destroy the attached message (and all attached documents) immediately.