[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Issue with connecting nodes to pool/master



In my efforts to truncate i accidently removed some vital information for
you. This time im not even going to truncate.

[root@TRANSLTR log]# netstat -l
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address               Foreign Address           
 State
tcp        0      0 TRANSLTR.netfeds.com:49952  *:*                       
 LISTEN
tcp        0      0 TRANSLTR.netfeds.com:9614   *:*                       
 LISTEN
tcp        0      0 *:sunrpc                    *:*                       
 LISTEN
tcp        0      0 TRANSLTR.netfeds.com:9618   *:*                       
 LISTEN
tcp        0      0 TRANSLTR.netfeds.com:47251  *:*                       
 LISTEN
tcp        0      0 TRANSLTR.netfeds.com:44599  *:*                       
 LISTEN
tcp        0      0 localhost.localdomain:ipp   *:*                       
 LISTEN
tcp        0      0 localhost.localdomain:5335  *:*                       
 LISTEN
tcp        0      0 localhost.localdomain:smtp  *:*                       
 LISTEN
tcp        0      0 *:51867                     *:*                       
 LISTEN
tcp        0      0 TRANSLTR.netfeds.com:44574  *:*                       
 LISTEN
tcp        0      0 *:ssh                       *:*                       
 LISTEN
udp        0      0 *:32768                     *:*
udp        0      0 TRANSLTR.netfeds.com:9614   *:*
udp        0      0 TRANSLTR.netfeds.com:9618   *:*
udp        0      0 TRANSLTR.netfeds.com:47251  *:*
udp        0      0 TRANSLTR.netfeds.com:44574  *:*
udp        0      0 TRANSLTR.netfeds.com:49952  *:*
udp        0      0 TRANSLTR.netfeds.com:44599  *:*
udp        0      0 *:bootpc                    *:*
udp        0      0 *:5353                      *:*
udp        0      0 *:sunrpc                    *:*
udp        0      0 *:757                       *:*
udp        0      0 *:ipp                       *:*




>>> > What machine is the log file below from? You should only have
>>> > a NegotiatorLog on one machine, the central manager.
>>> 1.101.
>>>
>>
>> Well, there's problem 1. Turn off your negotiator on 101. You should
>> only
>> have a negotiator on 102.
> Done
>
>>> > Errno 113 is "No Route To Host". Do you have your networking properly
>>> > configured (ie can you ping your central manager from all your other
>>> > machines?)
>>> all ICMP, UDP, TCP traffic is passing properly... You name the service
>>> i
>>> am able to transfer traffic. 21/22/23/80 etc
>>>
>>
>> 9618 :)
> LoL
> [root@node0 log]# ping TRANSLTR
> PING TRANSLTR.netfeds.com (192.168.1.102) 56(84) bytes of data.
> 64 bytes from TRANSLTR.netfeds.com (192.168.1.102): icmp_seq=1 ttl=64
> time=1.88 ms
>
> [root@TRANSLTR log]# ping node0
> PING node0.netfeds.com (192.168.1.101) 56(84) bytes of data.
> 64 bytes from node0.netfeds.com (192.168.1.101): icmp_seq=0 ttl=64
> time=0.176 ms
>
>
>>> 6/21 09:24:14 ERROR "Required attribute "START" is not defined" at line
>>> 255 in file util.C
>>>
>>
>> There's another problem. Make sure that you have a
>> START = <some valid expression> in either
>> /usr/local/condor/etc/condor_config
> Done, "START" error resolved however the below happens on node0... still
> no route to host.
>
>
> 6/27 19:22:17 ******************************************************
> 6/27 19:22:17 ** condor_startd (CONDOR_STARTD) STARTING UP
> 6/27 19:22:17 ** /usr/local/condor/sbin/condor_startd
> 6/27 19:22:17 ** $CondorVersion: 6.6.11 Mar 23 2006 $
> 6/27 19:22:17 ** $CondorPlatform: I386-LINUX_RH9 $
> 6/27 19:22:17 ** PID = 2769
> 6/27 19:22:17 ******************************************************
> 6/27 19:22:17 Using config file: /usr/local/condor/etc/condor_config
> 6/27 19:22:17 Using local config files:
> /usr/local/condor/local.node0/condor_config.local
> 6/27 19:22:17 DaemonCore: Command Socket at <192.168.1.101:55977>
> 6/27 19:22:24 New machine resource allocated
> 6/27 19:22:24 Failed to obtain keyboard or mouse idle information.
> 6/27 19:22:24 Assuming the keyboard and mouse to be infinitely idle.
> 6/27 19:22:24 About to run initial benchmarks.
> 6/27 19:22:32 Completed initial benchmarks.
> 6/27 19:22:32 State change: IS_OWNER is false
> 6/27 19:22:32 Changing state: Owner -> Unclaimed
> 6/27 19:22:36 Can't connect to <192.168.1.102:9618>:0, errno = 113
> 6/27 19:22:36 Will keep trying for 10 seconds...
>
> the log on the 'central manager' ;)
>
> 6/27 19:13:32 Housekeeper:  Ready to clean old ads
> 6/27 19:13:32   Cleaning StartdAds ...
> 6/27 19:13:32   Cleaning StartdPrivateAds ...
> 6/27 19:13:32   Cleaning ScheddAds ...
> 6/27 19:13:32   Cleaning SubmittorAds ...
> 6/27 19:13:32   Cleaning LicenseAds ...
> 6/27 19:13:32   Cleaning MasterAds ...
> 6/27 19:13:32   Cleaning CkptServerAds ...
> 6/27 19:13:32   Cleaning CollectorAds ...
> 6/27 19:13:32   Cleaning StorageAds ...
> 6/27 19:13:32 Housekeeper:  Done cleaning
> 6/27 19:13:33 (Sent 3 ads in response to query)
> 6/27 19:13:33 Got QUERY_STARTD_PVT_ADS
> 6/27 19:13:33 (Sent 1 ads in response to query)
> 6/27 19:18:33 (Sent 3 ads in response to query)
> 6/27 19:18:33 Got QUERY_STARTD_PVT_ADS
> 6/27 19:18:33 (Sent 1 ads in response to query)
>
>
>
> ----
> Some testing
>
> [root@node0 log]# telnet 192.168.1.102 22
> Trying 192.168.1.102...
> Connected to TRANSLTR.netfeds.com (192.168.1.102).
> Escape character is '^]'.
> SSH-2.0-OpenSSH_4.2
>
> [root@TRANSLTR log]# netstat -l
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address               Foreign Address
>  State
> ~~~~~ TRUNCATED ~~~~
> udp        0      0 TRANSLTR.netfeds.com:9614   *:*
> udp        0      0 TRANSLTR.netfeds.com:9618   *:*
> udp        0      0 TRANSLTR.netfeds.com:47251  *:*
> udp        0      0 TRANSLTR.netfeds.com:44574  *:*
> udp        0      0 TRANSLTR.netfeds.com:49952  *:*
> udp        0      0 TRANSLTR.netfeds.com:44599  *:*
> ~~~~~ TRUNCATED ~~~~
>
> [root@node0 log]# telnet 192.168.1.102 9618
> Trying 192.168.1.102...
> telnet: connect to address 192.168.1.102: No route to host
> telnet: Unable to connect to remote host: No route to host
>
> [root@TRANSLTR log]# ps -eef | grep condor
> condor    3481     1  0 18:58 ?        00:00:00
> /usr/local/condor/sbin/condor_master
> condor    3482  3481  0 18:58 ?        00:00:00 condor_collector -f
> condor    3483  3481  0 18:58 ?        00:00:00 condor_schedd -f
> condor    3484  3481  0 18:58 ?        00:00:03 condor_startd -f
> condor    3485  3481  0 18:58 ?        00:00:00 condor_negotiator -f
>
> [root@node0 log]# ps -eef | grep condor
> daemon    2766     1  0 19:22 ?        00:00:00
> /usr/local/condor/sbin/condor_master
> daemon    2767  2766  0 19:22 ?        00:00:00 condor_schedd -f
> daemon    2769  2766  7 19:22 ?        00:00:08 condor_startd -f
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at either
> https://lists.cs.wisc.edu/archive/condor-users/
> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
>