[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Issue with connecting nodes to pool/master



>> > What machine is the log file below from? You should only have
>> > a NegotiatorLog on one machine, the central manager.
>> 1.101.
>>
>
> Well, there's problem 1. Turn off your negotiator on 101. You should only
> have a negotiator on 102.
Done

>> > Errno 113 is "No Route To Host". Do you have your networking properly
>> > configured (ie can you ping your central manager from all your other
>> > machines?)
>> all ICMP, UDP, TCP traffic is passing properly... You name the service i
>> am able to transfer traffic. 21/22/23/80 etc
>>
>
> 9618 :)
LoL
[root@node0 log]# ping TRANSLTR
PING TRANSLTR.netfeds.com (192.168.1.102) 56(84) bytes of data.
64 bytes from TRANSLTR.netfeds.com (192.168.1.102): icmp_seq=1 ttl=64
time=1.88 ms

[root@TRANSLTR log]# ping node0
PING node0.netfeds.com (192.168.1.101) 56(84) bytes of data.
64 bytes from node0.netfeds.com (192.168.1.101): icmp_seq=0 ttl=64
time=0.176 ms


>> 6/21 09:24:14 ERROR "Required attribute "START" is not defined" at line
>> 255 in file util.C
>>
>
> There's another problem. Make sure that you have a
> START = <some valid expression> in either
> /usr/local/condor/etc/condor_config
Done, "START" error resolved however the below happens on node0... still
no route to host.


6/27 19:22:17 ******************************************************
6/27 19:22:17 ** condor_startd (CONDOR_STARTD) STARTING UP
6/27 19:22:17 ** /usr/local/condor/sbin/condor_startd
6/27 19:22:17 ** $CondorVersion: 6.6.11 Mar 23 2006 $
6/27 19:22:17 ** $CondorPlatform: I386-LINUX_RH9 $
6/27 19:22:17 ** PID = 2769
6/27 19:22:17 ******************************************************
6/27 19:22:17 Using config file: /usr/local/condor/etc/condor_config
6/27 19:22:17 Using local config files:
/usr/local/condor/local.node0/condor_config.local
6/27 19:22:17 DaemonCore: Command Socket at <192.168.1.101:55977>
6/27 19:22:24 New machine resource allocated
6/27 19:22:24 Failed to obtain keyboard or mouse idle information.
6/27 19:22:24 Assuming the keyboard and mouse to be infinitely idle.
6/27 19:22:24 About to run initial benchmarks.
6/27 19:22:32 Completed initial benchmarks.
6/27 19:22:32 State change: IS_OWNER is false
6/27 19:22:32 Changing state: Owner -> Unclaimed
6/27 19:22:36 Can't connect to <192.168.1.102:9618>:0, errno = 113
6/27 19:22:36 Will keep trying for 10 seconds...

the log on the 'central manager' ;)

6/27 19:13:32 Housekeeper:  Ready to clean old ads
6/27 19:13:32   Cleaning StartdAds ...
6/27 19:13:32   Cleaning StartdPrivateAds ...
6/27 19:13:32   Cleaning ScheddAds ...
6/27 19:13:32   Cleaning SubmittorAds ...
6/27 19:13:32   Cleaning LicenseAds ...
6/27 19:13:32   Cleaning MasterAds ...
6/27 19:13:32   Cleaning CkptServerAds ...
6/27 19:13:32   Cleaning CollectorAds ...
6/27 19:13:32   Cleaning StorageAds ...
6/27 19:13:32 Housekeeper:  Done cleaning
6/27 19:13:33 (Sent 3 ads in response to query)
6/27 19:13:33 Got QUERY_STARTD_PVT_ADS
6/27 19:13:33 (Sent 1 ads in response to query)
6/27 19:18:33 (Sent 3 ads in response to query)
6/27 19:18:33 Got QUERY_STARTD_PVT_ADS
6/27 19:18:33 (Sent 1 ads in response to query)



----
Some testing

[root@node0 log]# telnet 192.168.1.102 22
Trying 192.168.1.102...
Connected to TRANSLTR.netfeds.com (192.168.1.102).
Escape character is '^]'.
SSH-2.0-OpenSSH_4.2

[root@TRANSLTR log]# netstat -l
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address               Foreign Address           
 State
~~~~~ TRUNCATED ~~~~
udp        0      0 TRANSLTR.netfeds.com:9614   *:*
udp        0      0 TRANSLTR.netfeds.com:9618   *:*
udp        0      0 TRANSLTR.netfeds.com:47251  *:*
udp        0      0 TRANSLTR.netfeds.com:44574  *:*
udp        0      0 TRANSLTR.netfeds.com:49952  *:*
udp        0      0 TRANSLTR.netfeds.com:44599  *:*
~~~~~ TRUNCATED ~~~~

[root@node0 log]# telnet 192.168.1.102 9618
Trying 192.168.1.102...
telnet: connect to address 192.168.1.102: No route to host
telnet: Unable to connect to remote host: No route to host

[root@TRANSLTR log]# ps -eef | grep condor
condor    3481     1  0 18:58 ?        00:00:00
/usr/local/condor/sbin/condor_master
condor    3482  3481  0 18:58 ?        00:00:00 condor_collector -f
condor    3483  3481  0 18:58 ?        00:00:00 condor_schedd -f
condor    3484  3481  0 18:58 ?        00:00:03 condor_startd -f
condor    3485  3481  0 18:58 ?        00:00:00 condor_negotiator -f

[root@node0 log]# ps -eef | grep condor
daemon    2766     1  0 19:22 ?        00:00:00
/usr/local/condor/sbin/condor_master
daemon    2767  2766  0 19:22 ?        00:00:00 condor_schedd -f
daemon    2769  2766  7 19:22 ?        00:00:08 condor_startd -f