Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] GCB "Unable to determine local IP address"
- Date: Thu, 14 Feb 2008 09:57:12 -0800
- From: "Gideon Juve" <juve@xxxxxxx>
- Subject: Re: [Condor-users] GCB "Unable to determine local IP address"
> Assuming your glidein target is Linux, and assuming your config file is
> not setting NETWORK_INTERFACE.
I tried setting NETWORK_INTERFACE and got the same result. There's
something in the manual that says NET_REMAP_ENABLE turns on
BIND_ALL_INTERFACES. Would that cause NETWORK_INTERFACE to be ignored?
> Unfortunately, I think some clues may have appeared in your MasterLog
> file on lines earlier than where you started above. Could you send the
> entire MasterLog? Specifically, I am hoping that a line(s) containing
> "_all_myIP" appears in the log, since it looks like this is the
> underlying function that failed --- and it looks like it always will log
> why it is failing.
Here's the entire log from a later test:
2/12 18:03:08 ******************************************************
2/12 18:03:08 ** condor_master (CONDOR_MASTER) STARTING UP
2/12 18:03:08 **
/home/geovault-00/juve/Condor_glidein/7.0.0-x86_64-pc-Linux-2.4-glibc2.3/condor_master
2/12 18:03:08 ** $CondorVersion: 7.0.0 Jan 22 2008 BuildID: 72173 $
2/12 18:03:08 ** $CondorPlatform: X86_64-LINUX_RHEL3 $
2/12 18:03:08 ** PID = 5784
2/12 18:03:08 ** Log last touched time unavailable (No such file or directory)
2/12 18:03:08 ******************************************************
2/12 18:03:08 Using config source:
/home/geovault-00/juve/Condor_glidein/glidein_condor_config
2/12 18:03:08 GCB: GCB_bind(7[GCB_SOCKET], 0x7fbfffe7a0-><0.0.0.0:0>):
Unable to determine local IP address (_myIP failed).
2/12 18:03:08 Failed to bind to command ReliSock
2/12 18:03:08 (Make sure your IP address is correct in /etc/hosts.)
2/12 18:03:08 ERROR "BindAnyCommandPort() failed" at line 8385 in file
daemon_core.C
> Possible reasons why it could fail:
> 1. out of memory (malloc returns NULL)
> 2. failure to open a datagram socket
> 3. failure to call ioctl SIOCGIFCONF to get list of all interfaces
> 4. failure to ioctl SIOCGIFFLAGS to find out if an interface is up
> 5. more than 10 network interfaces on the machine
>
> Thoughts on the above:
> (1) - does not seem likely.
> (2) - perhaps a limit on number of descriptors or sockets for this user
> is being hit? could try running "limit" as the glidein user.
The limit on descriptors is set to 4096.
> (3) - don't know why this would fail unless there is some uncommon
> permission settings going on. can you successfully run "/sbin/ifconfig"
> as the glidein user?
> (4) - same thoughts as #3. also, are any of the interfaces listed from
> ifconfig really strange? maybe we are failing to see if some virtual
> interface, like from some VPN software, is up or down and that is
> confusing GCB.
> (5) - when your run /sbin/ifconfig, how many entries do you see? If the
> answer is 10 or more, I think we have discovered the problem. The GCB
> code has a static limit of 10 network interfaces. If this is indeed the
> problem you are hitting, we could improve this.
I can run ifconfig on the nodes. It shows two interfaces: eth0 and lo.
eth0 Link encap:Ethernet HWaddr 00:11:43:32:D7:F9
inet addr:10.255.255.247 Bcast:10.255.255.255 Mask:255.0.0.0
inet6 addr: fe80::211:43ff:fe32:d7f9/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:829942 errors:0 dropped:0 overruns:0 frame:0
TX packets:965888 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:161701145 (154.2 MiB) TX bytes:159499192 (152.1 MiB)
Base address:0xdcc0 Memory:dfae0000-dfb00000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:28031 errors:0 dropped:0 overruns:0 frame:0
TX packets:28031 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3865070 (3.6 MiB) TX bytes:3865070 (3.6 MiB)
> Another idea is to side-step the problem all-together by specifying
> NETWORK_INTERFACE in your glidein's config file, although I realize this
> could be a pain in the rear to do, and ideally i'd like to understand
> why the above setup is failing in your environment....
I am wondering if there might be an issue with _myIP and _all_myIP on
some platforms. If I compile GCB on the nodes and call those two
functions from a test program linked with libGCBcomm.a, _all_myIP
returns only 127.0.0.1, and that causes _myIP to fail. Specifically, I
think that the part of _all_myIP where it loops through the interfaces
might be setting the pointer incorrectly. If I change:
ptr += sizeof(ifr->ifr_name) + len;
To:
ptr += sizeof(struct ifreq);
Then _all_myIP returns the other interface(s) on the machines I am
having problems with.
Gideon
#include <stdio.h>
#include <stdint.h>
int _all_myIP(uint32_t *ip_arr, int *size);
int _myIP(uint32_t *ipaddr);
int test_all_myIP()
{
uint32_t ip_arr[10], ipaddr;
int size = 10;
int i, res;
res = _all_myIP(ip_arr, &size);
if (res < 0) return res;
printf("_all_myIP(): %d\n",size);
for (i=0; i<size; i++)
{
ipaddr = ip_arr[i];
printf("%d.%d.%d.%d\n",
(ipaddr >> 0) & 0xff,
(ipaddr >> 8) & 0xff,
(ipaddr >> 16) & 0xff,
(ipaddr >> 24) & 0xff);
}
return res;
}
int test_myIP()
{
uint32_t ipaddr;
int res;
res = _myIP(&ipaddr);
if (res < 0) return res;
printf("_myIP():\n%d.%d.%d.%d\n",
(ipaddr >> 0) & 0xff,
(ipaddr >> 8) & 0xff,
(ipaddr >> 16) & 0xff,
(ipaddr >> 24) & 0xff);
return res;
}
int main(int argc, char **argv)
{
int res;
res = test_all_myIP();
printf("_all_myIP() returned %d\n",res);
res = test_myIP();
printf("_myIP() returned %d\n",res);
return 0;
}