[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] daemons not using IPv4 on unusable IPv6 network

Thumbs up! *puts on my guess-we-should-update-more-often-hat*

	No, you're good.  v8.6.5 isn't actually out yet. ;)

Our ALLOW for this is indeed 'condor_pool@xxxxxxxxx', and the IPv6 was indeed not resolved to gridka.de.

Give me a pointer on how Condor handles these identities:
- In this case, 'gridka.de' is the UID_DOMAIN, not the actual fqdn domain.

- Unless we allow it to (TRUST_UID_DOMAIN), the UID_DOMAIN can only be used from hosts which also share the domain in their fqdn.

- Since the fqdn cannot be resolved/does not resolve to '*.gridka.de', the authorisation fails because the UID_DOMAIN cannot be verified.

So even though we do not explicitly require the host name to match gridka.de, it is implicitly required to match the domain name.

Our security guy looked at this, and I got it wrong: the problem wasn't that the IPv6 address didn't resolve, but was another instance of the problem where '*' in ALLOW lists is misinterpreted as IPv4-only. In general, the form for security principals is 'user@uid-domain/source', so that you can say things like 'condor_pool@*/10.*' so that you only accept the pool password from your private network but have more than one UID domain in it. However, if you don't specif a source, e.g., you specify
'condor_pool@xxxxxxxxx', HTCondor internally converts that to
'condor_pool@xxxxxxxxx/*' before parsing it... and buggily goes on to treat that '*' as if it were IPv4-only. This has been fixed and the fix will be released in v8.6.5 and v8.7.3.

Does the regular NETWORK_INTERFACE play a role in this too? On the Schedds, it is set to the private address IPv4.

As far as I know, only the other daemon's NETWORK_INTERFACE plays a role (by determining which address that daemon will advertise). The daemon choosing to which address to connect doesn't care what its own NETWORK_INTERFACE is.

Should the default for ENABLE_IPV6 be FALSE if only link-local IPv6 address are found?

I'm tempted to say 'yes'. On all machines with link-local only, condor_config_val is showing IPV6_ADDRESS = ::1 - so it seems to ignore the link-local interfaces in some cases already.

	I'll write up a feature request for this.

That's the part that is confusing me about the incident: the entire PREFER_IPV4+friends knobs are using default values, i.e. True. The IPv4 addresses are present on all machines, and have been used by HTCondor for months.

When choosing between two different outbound addresses, PREFER_IPV4 only prefers IPv4 over IPv6 if the addresses are otherwise equally "desirable". "Desirable" means "public over private over non-IPv6 link-local over loopback over IPv6 link-local". So, in your case, with a private IPv4 address and a public IPv6 address, PREFER_IPV4 won't come into play -- HTCondor will prefer the public address.

I've set ENABLE_IPV6=FALSE on the central node, which fixed the issue for now. Being able to safely set 'auto' on all machines would greatly simplify adopting dual stack, though.

	Noted. :)

At any rate, I have to say that we feel *safe enough* to tackle IPv6 for condor soon. Most of the issues are *definitely* our fault, and something tells me this might be the case for the others as well. Either way, the condor cluster remained stable enough to make some extensive tests in the future.

	Glad to hear it.

- ToddM