[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] daemons not using IPv4 on unusable IPv6 network



Hi all,

we had an unexpected dual stack test run yesterday when our central node [collector+negotiator] started using IPv6 due to network misconfiguration. On a technical level, that worked fine and the SharedPort just added the IPv6 address [1].
However, as a result many other HTCondor components started using IPv6 even though it obviously cannot work for them:

- The Negotiator started talking to local daemons on IPv6 even though forward resolution failed [2].
- Various Schedds started talking to the Negotiator even though there was no way to contact them [3]. The Schedd hosts have only a link-local IPv6 address at the moment.

Since we want to start using dual stack soon, we would like to understand these side-effects.
The main question for us is why did these components try using IPv6 anyway? We use the default settings for IPv4/6 selection [4], which seem to imply that IPv6 is not used at all unless IPv4 is not working.
Are there additional knobs we should set to prefer IPv4, or might accidentally have set to prefer IPv6?

Our Schedds are HTC 8.6.1 - 8.6.4, Collector+Negotiator are HTC 8.6.0.

Cheers,
Max

[1]
07/10/17 06:34:36 (pid:3203402) (D_ALWAYS) About to update statistics in shared_port daemon ad file at /var/lock/condor/shared_port_ad :
(...)
MyAddress = "<10.97.13.108:9618?addrs=10.97.13.108-9618+[--1]-9618&noUDP>"
(...)
07/10/17 06:34:54 (pid:3203402) (D_ALWAYS) Got SIGHUP.  Re-reading config files.
07/10/17 06:34:54 (pid:3203402) (D_ALWAYS) main_config() called
07/10/17 06:34:54 (pid:3203402) (D_ALWAYS) About to update statistics in shared_port daemon ad file at /var/lock/condor/shared_port_ad :
(...)
MyAddress = "<10.97.13.108:9618?addrs=10.97.13.108-9618+[2a00-1398-10a-610d-3a63-bbff-fe3f-7a08]-9618&noUDP>"
(...)

[2]
07/10/17 06:40:01 (pid:3203476) (D_ALWAYS) WARNING: forward resolution of fe80::3a63:bbff:fe3f:59b4 doesn't match fe80::3a63:bbff:fe3f:59b4!
07/10/17 06:40:01 (pid:3203476) (D_ALWAYS) PERMISSION DENIED to condor_pool@xxxxxxxxx from host fe80::3a63:bbff:fe3f:59b4 for command 421 (Reschedule), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: fe80::3a63:bbff:fe3f:59b4, hostname size = 0, original ip address = fe80::3a63:bbff:fe3f:59b4
07/10/17 06:40:01 (pid:3203476) (D_ALWAYS) DC_AUTHENTICATE: Command not authorized, done!
07/10/17 06:40:03 (pid:3203476) (D_ALWAYS) WARNING: forward resolution of 2a00:1398:10a:610d:3a63:bbff:fe3f:7a08 doesn't match 2a00:1398:10a:610d:3a63:bbff:fe3f:7a08!
07/10/17 06:40:03 (pid:3203476) (D_ALWAYS) PERMISSION DENIED to unauthenticated@unmapped from host 2a00:1398:10a:610d:3a63:bbff:fe3f:7a08 for command 451 (GetPriority), access level READ: reason: READ authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 2a00:1398:10a:610d:3a63:bbff:fe3f:7a08, hostname size = 0, original ip address = 2a00:1398:10a:610d:3a63:bbff:fe3f:7a08

[3]
07/10/17 06:40:40 (pid:4473) (D_ALWAYS) attempt to connect to <[2a00:1398:10a:610d:3a63:bbff:fe3f:7a08]:9618> failed: Network is unreachable (connect errno = 101).
07/10/17 06:40:40 (pid:4473) (D_ALWAYS|D_FAILURE) Failed to send RESCHEDULE to negotiator NEGOTIATOR: SECMAN:2003:TCP connection to negotiator NEGOTIATOR failed.

[4]
PREFER_IPV4 = true
ADVERTISE_IPV4_FIRST = $(PREFER_IPV4)
PREFER_OUTBOUND_IPV4 = $(PREFER_IPV4)

Attachment: smime.p7s
Description: S/MIME cryptographic signature