Thank you so much for your very thoughtful response. My comments are
> These are interesting ideas. I
would make the following comments:
> - Using a RDBMS is (probably)
overkill unless you've got a really huge
> set of hosts. Database systems
really come into their own when you need
> to be able to make a (large)
number of changes to a datastore whilst
> maintaining transactional
consistency. Certainly in the 400-500 node
> pool that I maintain, updating
flat files and running `condor_reconfig
> -all` is sufficient.
agree that a relational database is overkill when your configuration files are
infrequently updated and well-tuned to your needs.
Given the very
flexible and highly configurable nature of Condor system, there are many
wonderful things one can do with Condor by dynamically adding new resource
attributes, or dynamically changing Condor’s policy expressions. In large
companies, machines are typically shared across different groups, and each
group owns their machines and hence has some unique set of policies and
settings. Sorting out and remembering which local config files contain
which policies can lead to management headaches. A central database can
help with that.
A database can
open up some possibilities that you may not have considered before. Let's
say that a central database makes it easy to change Condor's policy expressions
(START, PREEMPT, RANK, etc) for arbitrary groups of machines. Now, let's
also say that your boss walks in and wants 50 machines for his exclusive use
RIGHT NOW. Problem solved: it's easy to just change the START _expression_
for 50 machines in your central database.
To take this a
step further, what if Condor's policy expressions could change _automatically_
in response to some event (or events)? To give an example, you could set
up a “rule” to change the policies of a pool when it is highly
loaded. Another “rule” could exist to change pool policies
when certain throughput requirements aren't being met by an important group in
your company. This is just the tip of the iceberg.
In order to
respond like this, however, we need to capture more information into the
database. It needs to essentially contain the state of the entire pool -
all the machine ads, all the job ads, historical job performance, information
on running daemons, and more. Handling all this data demands a powerful
database. But once all this information is available and centrally
located, it becomes possible to analyze, visualize, and even troubleshoot
I have spent
quite a bit of time and energy to envision and then develop a way to automate
and manage Condor. The concepts above are central to ongoing work at
Optena. I welcome further your discussion and exchange of ideas.
electronic transmission (and any attached documents) contains information from
Optena Corporation and is for the sole use of the individual or entity it is
addressed to. If you receive this message in error, please notify me and
destroy the attached message (and all attached documents) immediately.