[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] automatic selection of advertised IP



HI all,

I am trying to figure out why our publicly reachable schedd nodes are consistently picking the wrong network interface.

According to the docs, not setting anything should have the daemons pick the interface used for talking to the collector:

	NETWORK_INTERFACE
	[...] If multiple network interfaces match the value and ENABLE_ADDRESS_REWRITING is True (the default), the IP address that is chosen to be advertised will be the one that is used to communicate with the condor_collector. [...]

All daemons (Master, Schedd, SharedPort) pick the internal address of our collector [1].
However, on startup they already pick their external address for themselves [2].
The address is never updated to the internal one. SharedPort regularly updates its statistics, but sticks to its address [3].

For now, I had to manually force the internal address via
	NETWORK_INTERFACE=10.*

Cheers,
Max

[1] /var/log/condor/SharedPortLog
02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) COLLECTOR_HOST is set to "lrms-htcondor-1-kit.gridka.de"
02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) Checking if lrms-htcondor-1-kit.gridka.de is a sinful address
02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) lrms-htcondor-1-kit.gridka.de is not a sinful address: does not begin with "<"
02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) New Daemon obj (collector) name: "lrms-htcondor-1-kit.gridka.de", pool: "NULL", addr: "NULL"
02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) Using name "lrms-htcondor-1-kit.gridka.de" to find daemon
02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) Port not specified, using default (9618)
02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) Host info "lrms-htcondor-1-kit.gridka.de" is a hostname, finding IP address
02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) DNS returned:
02/14/17 14:38:18 (pid:12498) (D_HOSTNAME)      10.97.13.108
02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) We returned:
02/14/17 14:38:18 (pid:12498) (D_HOSTNAME)      10.97.13.108
02/14/17 14:38:18 (pid:12498) (D_HOSTNAME) Found IP address and port <10.97.13.108:9618>

[2] /var/log/condor/SharedPortLog
02/14/17 14:20:37 (pid:12162) (D_HOSTNAME) NETWORK_INTERFACE=* matches lo 127.0.0.1, eth0 192.108.45.30, eth1 10.33.1.130, lo ::1, eth0 fe80::217:8ff:fe50:d732, eth1 fe80::217:8ff:fe50:d731, ch
oosing IP 192.108.45.30
02/14/17 14:20:37 (pid:12162) (D_HOSTNAME) DNS returned:
02/14/17 14:20:37 (pid:12162) (D_HOSTNAME)      10.33.1.130
02/14/17 14:20:37 (pid:12162) (D_HOSTNAME)      192.108.45.30
02/14/17 14:20:37 (pid:12162) (D_HOSTNAME) We returned:
02/14/17 14:20:37 (pid:12162) (D_HOSTNAME)      10.33.1.130
02/14/17 14:20:37 (pid:12162) (D_HOSTNAME)      192.108.45.30
02/14/17 14:20:37 (pid:12162) (D_HOSTNAME)    I like it.
02/14/17 14:20:37 (pid:12162) (D_HOSTNAME) hostname: gridka30 (score 4) new winner
02/14/17 14:20:37 (pid:12162) (D_HOSTNAME) I am: hostname: gridka30, fully qualified doman name: gridka30, IP: 192.108.45.30, IPv4: 192.108.45.30, IPv6: ::1

[3] /var/log/condor/SharedPortLog
02/14/17 14:53:18 (pid:12498) (D_ALWAYS) About to update statistics in shared_port daemon ad file at /var/lock/condor/shared_port_ad :
ForkedChildrenPeak = 0
RequestsBlocked = 34
RequestsPendingCurrent = 0
MyAddress = "<192.108.45.30:9618?addrs=192.108.45.30-9618+[--1]-9618&noUDP>"
RequestsPendingPeak = 1
RequestsFailed = 34
SharedPortCommandSinfuls = "<192.108.45.30:9618>,<[::1]:9618>"
ForkedChildrenCurrent = 0
RequestsSucceeded = 6

[4]$ condor_config_val IP_ADDRESS ENABLE_ADDRESS_REWRITING NETWORK_INTERFACE BIND_ALL_INTERFACES
192.108.45.30
true
*
true

Attachment: smime.p7s
Description: S/MIME cryptographic signature