[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Problems with a dual homed condor server



I am trying to deploy a condor schedd on a dual homed system. The system will take connections from the outside through grid middleware and then run a local condor_submit to submit a job to a central collector via the local schedd. The condor schedd should only communicate on the internal interface which is a private IP address with the collector and the node startd(s).

Here is my error.

5/19 12:18:53 Using config file: /etc/condor/condor_config
5/19 12:18:53 Using local config files: /data/osg-0.1.5/condor/local.t2data4/condor_config.local
5/19 12:18:53 DaemonCore: Command Socket at <192.168.1.14:32791>
5/19 12:18:54 Started DaemonCore process "/data/osg-0.1.5/condor/sbin/condor_schedd", pid and pgroup = 3183
5/19 12:19:00 DC_AUTHENTICATE: sock ip -> <192.168.1.14:32808>
5/19 12:19:00 DC_AUTHENTICATE: auth ip -> 198.202.74.80
5/19 12:19:00 DC_AUTHENTICATE: ERROR: IP not in agreement!!! BAILING!
5/19 12:38:30 DC_AUTHENTICATE: sock ip -> <192.168.1.14:32985>
5/19 12:38:30 DC_AUTHENTICATE: auth ip -> 198.202.74.80


I have run condor on multi-homed machines before and never seen this error. It looks like condor is taking the first interface for one ip but not the other? How can this be if I specify my NETWORK_INTERFACE?

Here are my configs

/etc/condor/condor_config:

# collector
CONDOR_HOST     = t2cdf01.local

/data/osg-0.1.5/condor/local.t2data4/condor_config.local:

NETWORK_INTERFACE = 192.168.1.14
CONDOR_HOST = t2cdf01.local

Here are a few details of my network setup

# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:30:48:81:FC:BE
         inet addr:198.202.74.80  Bcast:198.202.74.255  Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:70432 errors:0 dropped:0 overruns:0 frame:0
         TX packets:42263 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:19460279 (18.5 Mb)  TX bytes:10153427 (9.6 Mb)
         Interrupt:18

eth1      Link encap:Ethernet  HWaddr 00:30:48:81:FC:BF
         inet addr:192.168.1.14  Bcast:192.168.255.255  Mask:255.255.0.0
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:165956 errors:0 dropped:0 overruns:0 frame:0
         TX packets:712 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:15639240 (14.9 Mb)  TX bytes:110828 (108.2 Kb)
         Interrupt:19

# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
192.168.1.14            t2data4.sdsc.edu t2data4

# hostname
t2data4

# cat /etc/resolv.conf
search local
nameserver 192.168.21.1
nameserver 192.168.1.13

I have tried various permutations of hosts files and hostname settings. Rebooted etc.

The only explanation I have is that something is still ignoring the NETWORK_INTERFACE directive and just picking the first network interface. My next step is to swap network cables and reconfig the interfaces so that the internal interface comes up first. That means a trip to the computer center. Is there no way to strictle specific my internal IP for this auth ip via a config file? Note, if I drop the NETWORK_INTERFACE directive things work fine, at least locally. Everything just uses the first interface.

Thanks

Terrence
UCSD