[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTcondor High-Availability and dual-stack



This is a bug in HTCondor that occurs when the first DNS record for a hostname in HAD_LIST is an IPv6 address. The HAD daemon is comparing this IP address to the first IP address in its own contact information, which is an IPv4 address, and not recognizing that they belong to the same machine.

We will fix this in a future release:
https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=5728

If you need PREFER_IPV4=False for your HTCondor configuration, then you can work around this problem by enabling this configuration parameter for just the HAD daemons, like so:

HAD. PREFER_IPV4 = True
REPLICATION. PREFER_IPV4 = True

Or, you can use IPv4 addresses in HAD_LIST.

 - Jaime

> On Jun 8, 2016, at 2:56 AM, Carles Acosta <cacosta@xxxxxx> wrote:
> 
> Hello again,
> 
> I've updated my pool to version 8.5.5 and, using ENABLE_IPV4=auto, ENABLE_IPV6=auto and PREFER_IPV4=true options, the error is gone. However, when I change to PREFER_IPV4=false, there is still this error related with HA daemon:
> 
> HAD CONFIGURATION ERROR:  my address '<ipv4:51450?addrs=[ipv6]-51450+ipv4-51450>'is not present in HAD_LIST 'xxxx.pic.es:51450, xxxx.pic.es:51450'
> 
> Cheers,
> 
> Carles
> 
> On 06/07/2016 12:33 PM, Carles Acosta wrote:
>> Hi,
>> 
>> Ok, thank you very much Brian.
>> 
>> Cheers,
>> 
>> Carles
>> 
>> On 06/07/2016 12:14 PM, Brian Bockelman wrote:
>>> Hi Carles,
>>> 
>>> This is a known bug:
>>> 
>>> https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=5659
>>> 
>>> The fix was released (yesterday) in v8.5.5.
>>> 
>>> Brian
>>> 
>>>> On Jun 7, 2016, at 10:46 AM, Carles Acosta <cacosta@xxxxxx> wrote:
>>>> 
>>>> Dear all,
>>>> 
>>>> We are doing some testing with a small Htcondor pool with dual-stack. We are running the development version 8.5.4.
>>>> 
>>>> At the beginning, we were using the options: ENABLE_IPV4 = auto, ENABLE_IPV6 = auto and PREFER_IPV4 = false, so, our idea was to force HTcondor to use IPv6 as preferred option. We observed that the communication between the execution nodes and the central managers was fine, also with the schedds, but we had problems with the condor_had daemon in our central managers.
>>>> 
>>>> In the HADlog, we can see (where ipv4 and ipv6 are the corresponding addresses):
>>>> 
>>>> HAD CONFIGURATION ERROR:  my address '<ipv4:51450?addrs=[ipv6]-51450+ipv4-51450>'is not present in HAD_LIST 'xxxx.pic.es:51450, xxxx.pic.es:51450'
>>>> 
>>>> The High Availability daemon fails and then the negotiator daemon is not running in any of our central managers.
>>>> 
>>>> Similarly, changing to PREFER_IPV4 = true doesn't solve the problem and we see:
>>>> 
>>>> HADStateMachine::setReplicationDaemonSinfulStringhost names of machine and replication daemon do not match: ipv4:51450?addrs=ipv4-51450+[ipv6 vs. ipv4
>>>> 
>>>> Thus, we have to change to ENABLE_IPV4 = false and PREFER_IPV4 = false, to have High-Availability working again with IPv6 (or ENABLE_IPV6= false to use IPv4).
>>>> 
>>>> I'm not sure if I'm using the correct options or this is a known issue.
>>>> 
>>>> Thanks in advance.
>>>> 
>>>> Best regards,
>>>> 
>>>> Carles