[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Special "admin" jobs

On Tue, Apr 19, 2011 at 9:47 AM, Rich Pieri <ratinox@xxxxxxx> wrote:
This sort of process is usually handled better by a configuration management tool like Puppet or Cfengine.

Sound advice. At Cycle Computing we're big fans of Opscode's Chef management utility: http://www.opscode.com/chef/

There are a couple of reasons not to do this as a Condor:

1. You have to set up a special slot in order to have your admin job vacate all other jobs from the machine to ensure the machine is 'quiet' so you can do updates -- that seems like a lot of overhead for administration jobs;
2. You have to run a job for every machine, but how do you get every machine? If you have only condor_status output to go on machines may be down and not in that output when you run the command to build the machine list. So you may miss machines.
3. If a machine is down, when and how will be brought up to date? It could take a very, very long time for you to converge on a complete update of all your machines.
4. Your jobs don't run as root (or as an Administrator on Windows) so you need to elevate another user to root-like status to do updates in some cases and that can be dangerous.
5. On Windows, if a file is in use you can't overwrite it (thank you NTFS...), so that means you can't update things that your update code relies on to run.

And those are just the reasons that come quickly to mind. :)

You can write Chef recipes that will quiet a machine with condor_off before applying updates so that  you know you're always updating machines that are not running jobs. And you can start up Chef before Condor on your machines so that down machines are brought up to date by Chef before Condor is started and allowed to run jobs.

- Ian