[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] linux configuration/condor configuration....



ok...

getting closer!!! i did a reboot of the 'central_manager' after making the
changes....

however, when i run 'condor_status' i'm getting the following in the
/home/condor/log/CollectorLog:

10/7 09:34:25 ** Master < lserver5 > rejuvenated from recently down
10/7 09:34:25 stats: Inserting new hashent for
'Master':'lserver5':'192.168.1.55'
10/7 09:34:27 Got QUERY_STARTD_ADS
10/7 09:34:27 (Sent 0 ads in response to query)
10/7 09:34:29 Got QUERY_STARTD_ADS
10/7 09:34:29 (Sent 0 ads in response to query)
10/7 09:34:34 StartdAd     : Inserting ** "< lserver5 , 192.168.1.55 >"
10/7 09:34:34 stats: Inserting new hashent for
'Start':'lserver5':'192.168.1.55'
10/7 09:34:34 StartdPvtAd  : Inserting ** "< lserver5 , 192.168.1.55 >"
10/7 09:34:34 stats: Inserting new hashent for
'StartdPvt':'lserver5':'192.168.1.55'
10/7 09:36:08 DaemonCore: PERMISSION DENIED to unknown user from host
<192.168.1.52:32806> for command 2 (UPDATE_MASTER_AD)
10/7 09:36:16 DaemonCore: PERMISSION DENIED to unknown user from host
<192.168.1.52:32806> for command 1 (UPDATE_SCHEDD_AD)
10/7 09:37:04 Got QUERY_STARTD_ADS
10/7 09:37:04 (Sent 1 ads in response to query)
10/7 09:37:33 DaemonCore: PERMISSION DENIED to unknown user from host
<192.168.1.52:32806> for command 0 (UPDATE_STARTD_AD)
10/7 09:39:20 (Sent 3 ads in response to query)
10/7 09:39:20 Got QUERY_STARTD_PVT_ADS
10/7 09:39:20 (Sent 1 ads in response to query)
10/7 09:40:47 DaemonCore: PERMISSION DENIED to unknown user from host
<192.168.1.52:34011> for command 5 (QUERY_STARTD_ADS)
10/7 09:40:51 DaemonCore: PERMISSION DENIED to unknown user from host
<192.168.1.52:34012> for command 5 (QUERY_STARTD_ADS)
10/7 09:41:08 DaemonCore: PERMISSION DENIED to unknown user from host
<192.168.1.52:32806> for command 2 (UPDATE_MASTER_AD)

so it appears that the 2nd machine is trying to connect, and is being
refused, which leads me to believe that i somehow have a mistake in the
config file....

i have a 'condor' user on both machines. querying google hasn't shed any
real light/understanding on this issue...

thanks

-bruce


-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx]
Sent: Thursday, October 07, 2004 9:30 AM
To: 'Condor-Users Mail List'
Subject: RE: [Condor-users] linux configuration/condor configuration....


ok....

now it appears i've really screwed things up....

running 'condor_status' bombs... in that it fails to connect to the
collector...

[root@lserver5 etc]# condor_status
CEDAR:6001:Failed to connect to <192.168.1.55:9618>
Error: Couldn't contact the condor_collector on lserver5.

however, i can ping lserver5 (which is the machine itself)
[root@lserver5 etc]# ping lserver5
PING lserver5 (192.168.1.55) from 192.168.1.55 : 56(84) bytes of data.
64 bytes from lserver5 (192.168.1.55): icmp_seq=1 ttl=64 time=0.103 ms
64 bytes from lserver5 (192.168.1.55): icmp_seq=2 ttl=64 time=0.034 ms

so.. what gives...

i changed the condor_config.local file to add the network_interface

######################################################################
##  Local settings
######################################################################
######################################################################

##  Place your own local configuration settings for your central
##  manager here.
NETWORK_INTERFACE = 192.168.1.55

------------------------

i changed the /etc/hosts file to be:
[root@lserver5 etc]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
192.168.1.55            lserver5
192.168.1.57            lserver7

and now.. it appears i'm worse off than i was earlier....

any ideas/suggestions....

thanks...

bruce


ps. the /home/condor/log/MasterLog displays....
--------------------
10/7 09:17:06 DaemonCore: PERMISSION DENIED to unknown user from host
<192.168.1.55:33806> for command 60005 (DC_OFF_GRACEFUL)
10/7 09:17:13 DaemonCore: Command received via TCP from host
<192.168.1.55:33807>
10/7 09:17:13 DaemonCore: received command 60004 (DC_RECONFIG), calling
handler (handle_reconfig())
10/7 09:17:13 Reconfiguring all running daemons.
10/7 09:17:13 Sent SIGHUP to COLLECTOR (pid 1428)
10/7 09:17:13 Sent SIGHUP to NEGOTIATOR (pid 1429)
10/7 09:17:13 Sent SIGHUP to STARTD (pid 1430)
10/7 09:17:13 Sent SIGHUP to SCHEDD (pid 1431)
10/7 09:17:13 Can't connect to <192.168.1.55:9618>:0, errno = 111
10/7 09:17:13 Will keep trying for 10 seconds...
10/7 09:17:23 Connect failed for 10 seconds; returning FALSE
10/7 09:17:23 ERROR:
SECMAN:2003:TCP connection to <192.168.1.55:9618> failed

10/7 09:17:23 Can't send UPDATE_MASTER_AD to collector lserver5
<192.168.1.55:9618>: Failed to send UDP update command to collector
10/7 09:17:26 DaemonCore: PERMISSION DENIED to unknown user from host
<192.168.1.55:33844> for command 453 (RESTART)
10/7 09:19:24 Can't connect to <192.168.1.55:9618>:0, errno = 111
10/7 09:19:24 Will keep trying for 10 seconds...
10/7 09:19:34 Connect failed for 10 seconds; returning FALSE
10/7 09:19:34 ERROR:
SECMAN:2003:TCP connection to <192.168.1.55:9618> failed

10/7 09:19:34 Can't send UPDATE_MASTER_AD to collector lserver5
<192.168.1.55:9618>: Failed to send UDP update command to collector
10/7 09:24:34 Can't connect to <192.168.1.55:9618>:0, errno = 111
10/7 09:24:34 Will keep trying for 10 seconds...
10/7 09:24:44 Connect failed for 10 seconds; returning FALSE
10/7 09:24:44 ERROR:
SECMAN:2003:TCP connection to <192.168.1.55:9618> failed


-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx]
Sent: Thursday, October 07, 2004 9:08 AM
To: 'roy hill (IGER-WP)'; 'Condor-Users Mail List'
Subject: RE: [Condor-users] linux configuration....


roy,

how can i check/change the permissions to be read/written by 'condor'
without screwing it up for other apps...

thanks...

-bruce


-----Original Message-----
From: roy hill (IGER-WP) [mailto:roy.hill@xxxxxxxxxxx]
Sent: Thursday, October 07, 2004 8:45 AM
To: bedouglas@xxxxxxxxxxxxx; Condor-Users Mail List
Subject: RE: [Condor-users] linux configuration....


Bruce,

Check the permissions on the Hosts file it needs to be set to be read by
your Condor account.

Best regards,
Roy.

-----Original Message-----
From: bruce [mailto:bedouglas@xxxxxxxxxxxxx] 
Sent: 07 October 2004 16:38
To: 'Condor-Users Mail List'
Subject: [Condor-users] linux configuration....


hi...

i have managed to get condor up/running on two linux boxes. however,
when i attempt to do a 'condor_status' on the 'central manager' it shows
only one machine. an examination of the /home/condor/log/CollectorLog
file shows a warning refering to the /etc/host file.

it appears that the 2nd machine is not able to 'see' the 'central
manager' condor...

10/7 07:54:19 ** condor_collector (CONDOR_COLLECTOR) STARTING UP 10/7
07:54:19 ** /opt/condor-6.6.6/sbin/condor_collector
10/7 07:54:19 ** $CondorVersion: 6.6.6 Jul 26 2004 $
10/7 07:54:19 ** $CondorPlatform: I386-LINUX_RH9 $
10/7 07:54:19 ** PID = 1428
10/7 07:54:19 ******************************************************
10/7 07:54:19 Using config file: /opt/condor-6.6.6/etc/condor_config
10/7 07:54:19 Using local config files: /home/condor/condor_config.local
10/7 07:54:19 DaemonCore: Command Socket at <127.0.0.1:9618> 10/7
07:54:19 WARNING: Condor is running on the loopback address (127.0.0.1)
10/7 07:54:19          of this machine, and is not visible to other
hosts!
10/7 07:54:19          This may be due to a misconfigured /etc/hosts
file.
10/7 07:54:19          Please make sure your hostname is not listed on
the
10/7 07:54:19          same line as localhost in /etc/hosts.
10/7 07:54:19 In ViewServer::Init()
10/7 07:54:19 In CollectorDaemon::Init()
10/7 07:54:19 In ViewServer::Config()
10/7 07:54:19 In CollectorDaemon::Config()
10/7 07:54:19 enable: Creating stats hash table
10/7 07:54:19 (Sent 0 ads in response to query)
10/7 07:54:19 Got QUERY_STARTD_PVT_ADS
10/7 07:54:19 (Sent 0 ads in response to query)
10/7 07:54:20 WARNING:  No master ad for < localhost.localdomain >
10/7 07:54:20 ScheddAd     : Inserting ** "< localhost.localdomain ,
127.0.0.1 >"
10/7 07:54:20 stats: Inserting new hashent for
'Schedd':'localhost.localdomain':'127.0.0.1'
10/7 07:54:24 ** Master < localhost.localdomain > rejuvenated from
recently down 10/7 07:54:24 stats: Inserting new hashent for
'Master':'localhost.localdomain':'127.0.0.1'
10/7 07:54:32 StartdAd     : Inserting ** "< localhost.localdomain ,
127.0.0.1 >"
10/7 07:54:32 stats: Inserting new hashent for
'Start':'localhost.localdomain':'127.0.0.1'
10/7 07:54:32 StartdPvtAd  : Inserting ** "< localhost.localdomain ,
127.0.0.1 >" 10/7 07:54:32 stats: Inserting new hashent for
'StartdPvt':'localhost.localdomain':'127.0.0.1'

the /etc/hosts file is:
[root@lserver5 etc]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               lserver5 localhost.localdomain localhost
192.168.1.57            lserver7

is there something that i should do to the /etc/hosts file. why am i
able to ping/access the 'central manager' machine (lserver5) from the
client machine
(lserver2) by simply 'ping lserver5'....

i can provide the relevant portion of the 'condor_config' file if
needed.

i'm really at a loss as to how to proceed!!!!!!!

thanks...

-bruce





_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users

<<attachment: winmail.dat>>