[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Failover feature in condor 6.7.5



Nick LeRoy wrote:
On Mon March 7 2005 4:18 pm, Prakash Velayutham wrote:
  
Ian Chesal wrote:
    
Hi,

I understand that the failover is a feature added in
condor-6.7.x versions. But I don't understand how to enable
this and configure the pool to work with this setup. Can
anyone help? As far as I know, there is nothing in the
documentation. I would like to be corrected in this regard.
        
See:
http://www.cs.wisc.edu/condor/manual/v6.7.5/8_2Development_Release.html#
SECTION00924000000000000000

The second bullet under "New Features" describes how to define multiple
collectors for failover.

- Ian
      
Hi Ian,

Thanks. What does the "High Availability" service under new features
section in the same link mean (8.2.6 Version 6.7.0)? It says:

Added a new ``High Availability'' service to the /condor_ master/. You
can now specify a daemon which can have ``fail over'' capabilities (i.e.
the master on another machine can start a matching daemon if the first
one fails). Currently, this is only available over a shared file system
(i.e. NFS), and has only been tested for the /condor_ schedd/.

I was looking to implement that. Is that the same as multiple collectors?
    

These are separate mechanisms, at least for now.  :-(  The feature that you 
describe above is currently just for schedd fail-over.  Separately, in recent 
6.7 Condor releases, your pool can now have redundant collectors.

A feature that we very much hope will make the next 6.7 release of Condor will 
provide for a fail-over mechanism for negotiators.  This is, again, a 
different mechanism.

-Nick
So the only place I need to change is still the $CONDOR_HOME/etc/condor_config file, right? Here I added the IP of the second collector in the COLLECTOR_HOST variable. Would it be enough to just restart condor on the second server after doing this? I get some errors of this kind when I do this...

DC_AUTHENTICATE: attempt to open invalid session frontier:17998:1110236945:14, failing

Any suggestions? Also is the NEGOTIATOR failover done the same way by adding the second server's IP to NEGOTIATOR_HOST variable? Is there a document that explains how these configs are done? I would be willing to experiment this and write a small doc if required.

Thanks,
Prakash