[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] GCB "Unable to determine local IP address"



Gideon Juve wrote:
We have been trying to run glideins with GCB on several different
systems without any success. I am hoping someone will be able to help
us out.

The glideins are configured with:

NET_REMAP_ENABLE = true
NET_REMAP_SERVICE = GCB
NET_REMAP_INAGENT = xxx.xxx.xxx.xxx

Where xxx.xxx.xxx.xxx is the IP of the server running the gcb_broker.
When the glidein job starts it prints the following in MasterLog and
exits:

2/12 13:19:33 GCB: GCB_bind(6[GCB_SOCKET], 0x7fbfffe000-><0.0.0.0:0>,
16): Unable to determine local IP address.  (_myIP failed)
2/12 13:19:33 Failed to bind to command ReliSock
2/12 13:19:33 (Make sure your IP address is correct in /etc/hosts.)
2/12 13:19:33 ERROR "BindAnyCommandPort failed" at line 7059 in file
daemon_core.C

We are using Condor 7.0.0.

Does anyone know what is wrong?

Assuming your glidein target is Linux, and assuming your config file is not setting NETWORK_INTERFACE.

Unfortunately, I think some clues may have appeared in your MasterLog file on lines earlier than where you started above. Could you send the entire MasterLog? Specifically, I am hoping that a line(s) containing "_all_myIP" appears in the log, since it looks like this is the underlying function that failed --- and it looks like it always will log why it is failing.

Possible reasons why it could fail:
  1. out of memory (malloc returns NULL)
  2. failure to open a datagram socket
  3. failure to call ioctl SIOCGIFCONF to get list of all interfaces
  4. failure to ioctl SIOCGIFFLAGS to find out if an interface is up
  5. more than 10 network interfaces on the machine

Thoughts on the above:
(1) - does not seem likely.
(2) - perhaps a limit on number of descriptors or sockets for this user is being hit? could try running "limit" as the glidein user. (3) - don't know why this would fail unless there is some uncommon permission settings going on. can you successfully run "/sbin/ifconfig" as the glidein user? (4) - same thoughts as #3. also, are any of the interfaces listed from ifconfig really strange? maybe we are failing to see if some virtual interface, like from some VPN software, is up or down and that is confusing GCB. (5) - when your run /sbin/ifconfig, how many entries do you see? If the answer is 10 or more, I think we have discovered the problem. The GCB code has a static limit of 10 network interfaces. If this is indeed the problem you are hitting, we could improve this.

Another idea is to side-step the problem all-together by specifying NETWORK_INTERFACE in your glidein's config file, although I realize this could be a pain in the rear to do, and ideally i'd like to understand why the above setup is failing in your environment....

regards,
Todd