[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_master cant connect with collector



On Tue, Feb 08, 2005 at 04:43:16PM -0500, Dave Lajoie wrote:
> Hello Guys!
>     I have been working on a linux condor deployment and ran into an
> issue.
>     basically the condor_master can't connect with the condor_collector
>     seems like condor_master is using an invalid port
>  
> Notice: seems like the collector is opening a port at 34287
> where master is attempting to connect at 9618
>  
> in this case, the master could not start the collector process.
> so I started it manually in order to get some more information
> about the error.
>  
> I ran the condor_init, prior to run sbin/condor_master
> do I need to update /etc/services with some entries?
>  
> I am missing something obvious again...
> any help is welcomed ( I am so closed to get it working. ;)
> Dave.
>  

Are you mixing a 6.6.8 and 6.7.3 config file?

How are you starting the condor_collector? If the master starts the
condor_collector, the master will automatically start the collector
with '-p 9618'. 

If you're running the condor_collector without a master, make sure
you have -p 9618 at the command line. (Running without a master is
not recommended)

-Erik

> here are the logs
>  
> Collector:
> 2/8 16:09:06 ******************************************************
> 2/8 16:09:06 ** condor_collector (CONDOR_COLLECTOR) STARTING UP
> 2/8 16:09:06 ** /NET/LINUX_SERVEUR/CONDOR/sbin/condor_collector
> 2/8 16:09:06 ** $CondorVersion: 6.6.8 Jan 27 2005 $
> 2/8 16:09:06 ** $CondorPlatform: I386-LINUX_RH9 $
> 2/8 16:09:06 ** PID = 3880
> 2/8 16:09:06 ******************************************************
> 2/8 16:09:06 Using config file: /home/condor/condor_config
> 2/8 16:09:06 Using local config files:
> /NET/LINUX_SERVEUR/CONDOR/hosts/rn207/condor_config.local
> 2/8 16:09:06 DaemonCore: Command Socket at <192.168.10.207:34287>
> 2/8 16:09:06 In ViewServer::Init()
> 2/8 16:09:06 In CollectorDaemon::Init()
> 2/8 16:09:06 In ViewServer::Config()
> 2/8 16:09:06 In CollectorDaemon::Config()
> 2/8 16:09:11 enable: Creating stats hash table
> 2/8 16:24:11 Housekeeper:  Ready to clean old ads
> 2/8 16:24:11  Cleaning StartdAds ...
> 2/8 16:24:11  Cleaning StartdPrivateAds ...
> 2/8 16:24:11  Cleaning ScheddAds ...
> 2/8 16:24:11  Cleaning SubmittorAds ...
> 2/8 16:24:11  Cleaning LicenseAds ...
> 2/8 16:24:11  Cleaning MasterAds ...
> 2/8 16:24:11  Cleaning CkptServerAds ...
> 2/8 16:24:11  Cleaning CollectorAds ...
> 2/8 16:24:11  Cleaning StorageAds ...
> 2/8 16:24:11 Housekeeper:  Done cleaning
> 2/8 16:39:11 Housekeeper:  Ready to clean old ads
> 2/8 16:39:11  Cleaning StartdAds ...
> 2/8 16:39:11  Cleaning StartdPrivateAds ...
> 2/8 16:39:11  Cleaning ScheddAds ...
> 2/8 16:39:11  Cleaning SubmittorAds ...
> 2/8 16:39:11  Cleaning LicenseAds ...
> 2/8 16:39:11  Cleaning MasterAds ...
> 2/8 16:39:11  Cleaning CkptServerAds ...
> 2/8 16:39:11  Cleaning CollectorAds ...
> 2/8 16:39:11  Cleaning StorageAds ...
> 2/8 16:39:11 Housekeeper:  Done cleaning
> 
> Master
> 2/8 16:08:33 ******************************************************
> 2/8 16:08:33 ** condor_master (CONDOR_MASTER) STARTING UP
> 2/8 16:08:33 ** /NET/LINUX_SERVEUR/CONDOR/sbin/condor_master
> 2/8 16:08:33 ** $CondorVersion: 6.6.8 Jan 27 2005 $
> 2/8 16:08:33 ** $CondorPlatform: I386-LINUX_RH9 $
> 2/8 16:08:33 ** PID = 3864
> 2/8 16:08:33 ******************************************************
> 2/8 16:08:33 Using config file: /home/condor/condor_config
> 2/8 16:08:33 Using local config files:
> /NET/LINUX_SERVEUR/CONDOR/hosts/rn207/condor_config.local
> 2/8 16:08:33 DaemonCore: Command Socket at <192.168.10.207:34244>
> 2/8 16:08:33 Started DaemonCore process
> "/NET/LINUX_SERVEUR/CONDOR/sbin/condor_startd", pid and pgroup = 3865
> 2/8 16:08:33 Started DaemonCore process
> "/NET/LINUX_SERVEUR/CONDOR/sbin/condor_schedd", pid and pgroup = 3866
> 2/8 16:08:38 Can't connect to <192.168.10.207:9618>:0, errno = 111
> 2/8 16:08:38 Will keep trying for 10 seconds...
> 2/8 16:08:48 Connect failed for 10 seconds; returning FALSE
> 2/8 16:08:48 ERROR:
> SECMAN:2003:TCP connection to <192.168.10.207:9618> failed
>  
> 2/8 16:08:48 Can't send UPDATE_MASTER_AD to collector rn207.bbfxa.com
> <192.168.10.207:9618>: Failed to send UDP update command to collector
> 2/8 16:13:48 Can't connect to <192.168.10.207:9618>:0, errno = 111
> 2/8 16:13:48 Will keep trying for 10 seconds...
> 2/8 16:13:58 Connect failed for 10 seconds; returning FALSE
> 2/8 16:13:58 ERROR:
> SECMAN:2003:TCP connection to <192.168.10.207:9618> failed
> 

> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> http://lists.cs.wisc.edu/mailman/listinfo/condor-users