Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] CM Failover with submits from CM

Date: Tue, 14 Jul 2009 12:23:49 -0500
From: Matthew Farrellee <matt@xxxxxxxxxx>
Subject: Re: [Condor-users] CM Failover with submits from CM

Janzen Brewer wrote:
> Thanks for the prompt replies.
> 
> I suppose my question has changed now. Is there any way to implement 
> Condor such that there is no single point of failure?
> 
> I've heard of DRBD, which I suppose could be used for redundancy in a 
> shared file system. I'd prefer not to have to implement it, though, as 
> my co-workers have told me it's more trouble than it's worth (e.g. 
> split-brain issues).
> 
> Thanks,
> Janzen

The split-brain problem is pretty standard in any distributed system,
including Condor.

You can do HA with the Central Manager, where each Collector has an
active copy of data. For HA Schedd (submit node), we rely on the
presence of a shared file system between schedd nodes, and only one node
is active.

There are options other than DRBD for distributed file systems, some
have better fail-over characteristics than others. You might want to
look into balancing the faults you can't tolerate vs what it costs to
handle them.

Best,

matt

Follow-Ups:
- Re: [Condor-users] CM Failover with submits from CM
  - From: Janzen Brewer

References:
- [Condor-users] CM Failover with submits from CM
  - From: Janzen Brewer
- Re: [Condor-users] CM Failover with submits from CM
  - From: Dan Bradley
- Re: [Condor-users] CM Failover with submits from CM
  - From: Janzen Brewer
- Re: [Condor-users] CM Failover with submits from CM
  - From: Dan Bradley
- Re: [Condor-users] CM Failover with submits from CM
  - From: Matthew Farrellee
- Re: [Condor-users] CM Failover with submits from CM
  - From: Janzen Brewer

Prev by Date: Re: [Condor-users] CM Failover with submits from CM
Next by Date: Re: [Condor-users] CM Failover with submits from CM
Previous by thread: Re: [Condor-users] CM Failover with submits from CM
Next by thread: Re: [Condor-users] CM Failover with submits from CM
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] CM Failover with submits from CM