[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Backup guidance



Hi,

The Full Monty, which I sometimes use, is to control all the system build with a provisioning system i.e kickstart combined with (e.g.) puppet or ansible, and to manage the build definition files of the provisioning system in a version control system such as svn or git.

And then backup the VC system in a grandfather/father/son scheme with a (e.g.) tape robot system, where the tapes are stored offline in a remote firesafe. You'd have to be pretty unlucky for all that to let you down!!! Although you'd also need to keep copies of all the OS and application build media as well, to be really sure. And then you need to think about the source code, if you are still paranoid.

And then have to test it continuously, since you need to roll out all incremental change through the provisioning system, and then be sure that the same system can recreate a pristine server.

It's easier said than done, and near-absolute certainty carries a near-absolutely high cost. And it can always still fail if (say) a nearby volcano erupts or war breaks out.

Ste


On 2018-07-18 16:18, Grant Goodyear wrote:
For protecting against system failure bringing down your cluster, the
high-availability section of the manual might be useful:
http://research.cs.wisc.edu/htcondor/manual/v8.6/3_13High_Availability.html.

I keep our condor config files under version control, and that's about
all the backing up we do. We're using ganglia to track usage history,
so we should back up the RRD data (*shrug*; we don't), but otherwise
we're okay with losing log data in a system crash. It's easy enough to
reinstall and copy over the proper config files to start anew.

-Grant Goodyear-

On Wed, Jul 18, 2018 at 7:07 AM Nathan Sharp <nsharp@xxxxxxxxxxxxxxx>
wrote:

Hello all,

We are investigating HTCondor for some potential production uses
and so far been having great success! A huge thanks to all who have
helped create all the online documentation on HTCondor.

So far I have had not very good luck finding information about how
to properly back up an HTCondor system to protect against system
failure and data loss. Is there information on this available
somewhere? Can the execute, spool, log, and lock folders be backed
up live, or do we need to shut down the daemons to get a clean
state? Are there other pieces that need to be backed up with the
files?

Thanks so much,

Nathan

------------------------------------------------
Nathan Sharp
Phoenix Integration Inc
www.phoenix-int.com [1]
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

--
Grant Goodyear
web: http://www.grantgoodyear.org
e-mail: grant@xxxxxxxxxxxxxxxxx

Links:
------
[1] http://www.phoenix-int.com
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/