[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] can Condor somehow be a HA?
- Date: Mon, 22 May 2017 11:02:55 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] can Condor somehow be a HA?
On 5/22/2017 6:45 AM, lejeczek wrote:
I've only started looking at htcondor, not having a good understanding
of it yet I wonder - htcondor has that concept of "central manager" and
I wonder if this makes it a valid candidate for HA setup?
Does anybody have any experience with/thoughts on htcondor as HA and
could share it here?
First off, understand that if your installations central manager dies,
currently running jobs will continue to run and even new jobs will
continue to get scheduled in many cases (i.e. new jobs will still get
scheduled to claimed slots). Even in production pools, most sites have
no problem with rebooting their central manager or even taking it down
for an hour or two - while the central manger is down, users may notice
that condor_status stops working, but practically all other common tools
continue to work (condor_submit, condor_q, condor_rm, etc). Thus many
pools don't ever bother with an HA solution for the central manager.
If you are still concerned, the HTCondor central manager is actually
very lightweight and holds very little state (just user prioirties), and
this is very amenable to a high availability (HA) setup. You
essentially have two choices:
1. HTCondor can be configured to have two central managers (hot/hot),
and automatically fail over as needed. See the section in the HTCondor
Manual titled "High Availability of the Central Manger" at
2. If you already run your services in a managed visualized setup
(Mesos+Marathan, OpenStack, vSphere, HyperV, etc) that supports
failover, you could setup your HTCondor central manager for HA
leveraging those environments, i.e. same way you would setup a redundant
email server, for instance.
Hope the above helps