[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Failover feature in condor 6.7.5



Gabi Kliot wrote:
Message
Hi
 
>So the only place I need to change is still the $CONDOR_HOME/etc/condor_config file, right? Here I added the IP of the second
> collector in the COLLECTOR_HOST variable. Would it be enough to just restart condor on the second server after doing this?
 
You need to add the IP of the second Collector to the COLLECTOR_HOST variable in  $CONDOR_HOME/etc/condor_config file and add COLLECTOR to the DAEMON_LIST   variable of the second Collector machine in the local config file of this second machine (just the same as it is done for the first Collector machine).


>Also is the NEGOTIATOR failover done the same way by adding the second server's IP to NEGOTIATOR_HOST variable?
>Is there a document that explains how these configs are done? I would be willing to experiment this and write a small doc if
> required.
 
Negotiator failover has not still been released. It is planned to be a part of the next Condor release, hopefully by the Condor week.
When ever it will be released, it will be of course accompanied by a detailed manual section regarding its installation and configuration (It will actually be unified section about Collector and Negotiator high availability).
 
It actually makes me very happy to know that there are people interested and anticipating the Negotiator failover feature in Condor. We are working hard those days to make it happen.
 
Regards,
 
Gabi
So the only place I need to change is still the $CONDOR_HOME/etc/condor_config file, right? Here I added the IP of the second collector in the COLLECTOR_HOST variable. Would it be enough to just restart condor on the second server after doing this? I get some errors of this kind when I do this...

DC_AUTHENTICATE: attempt to open invalid session frontier:17998:1110236945:14, failing

Any suggestions? Also is the NEGOTIATOR failover done the same way by adding the second server's IP to NEGOTIATOR_HOST variable? Is there a document that explains how these configs are done? I would be willing to experiment this and write a small doc if required.

Thanks,
Prakash
Thanks. Could you explain what is the way to restart services on the different machines after changing the COLLECTOR_HOST variable? Is it enough to just modify the local config file on the second collector to start the collector daemon and restart that server or should I restart condor on all the machines in the pool? Also does this failover have anything to do with flocking at all (Sorry for that stupid question, its just that I have never used flocking before)?

Prakash