[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] Fedora 3 collector problem



What about the firewall?

FC3 enables iptables by default. Are you allowing tcp and udp through
in the appropriate port range?

> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx 
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Joshua Juen
> Sent: Tuesday, 31 May 2005 11:08 PM
> To: Jose D. Zamora
> Cc: Condor-Users Mail List
> Subject: Re: [Condor-users] Fedora 3 collector problem
> 
> 
> Those files checked out ok. Still not sure what is happening.
> 
> Also if I try to run a condor_status from fedora (my master) 
> it says that it cannot connect to the collectore, but 
> condor_status does work from my clients
> 
> Thanks
> Josh
> 
> On 5/31/05, Jose D. Zamora <jzamora@xxxxxxxxxxxx> wrote:
> > Check file :
> > /etc/condor/condor_config
> > for :
> > COLLECTOR_HOST  = $(CONDOR_HOST)
> > DAEMON_LIST                     = MASTER, STARTD, SCHEDD, COLLECTOR,
> > NEGOTIATOR
> > and
> > Check file: /opt/condor-6.6.9/local.phy-condor/condor_config.local
> > for :
> > COLLECTOR_NAME = Collector at <hostname of your master here>
> > 
> > Hope this helps
> > 
> > On Tue, 31 May 2005 09:21:18 -0500, Joshua Juen <jj9867@xxxxxxxxx> 
> > wrote:
> > 
> > > I have set up condor as master on a Fedora 3 system. The 
> > > installation seems to be working except that the master 
> cannot find 
> > > the collector.
> > >
> > > The condor_status works from the client machines but none of the 
> > > machines can submit jobs. The submitting machine's jobs will just 
> > > sit in the queue.
> > >
> > > Error sending update to the collector : Failed to connect to 
> > > collector appears in the master log, the negotiator log and the 
> > > start log.
> > >
> > > The port that the collector should be on is open and I can telnet 
> > > into it. (I am assuming that the clients can also) but the master 
> > > can't seem to find it.
> > >
> > > I think that the problem is probably a simple configuration error 
> > > but I can not seem to track it down.
> > >
> > > Any help would be greatly appreciated,
> > > Thanks
> > > Josh
> > >
> > >
> > > MasterLog
> > >
> > > 5/31 08:24:19 
> ******************************************************
> > > 5/31 08:24:19 ** condor_master (CONDOR_MASTER) STARTING UP 5/31 
> > > 08:24:19 ** /opt/condor-6.6.9/sbin/condor_master
> > > 5/31 08:24:19 ** $CondorVersion: 6.6.9 Mar 10 2005 $
> > > 5/31 08:24:19 ** $CondorPlatform: I386-LINUX_RH9 $
> > > 5/31 08:24:19 ** PID = 2354
> > > 5/31 08:24:19 
> ******************************************************
> > > 5/31 08:24:19 Using config file: /etc/condor/condor_config 5/31 
> > > 08:24:19 Using local config files: 
> > > /opt/condor-6.6.9/local.phy-condor/condor_config.local
> > > 5/31 08:24:19 Attempting to lock 
> > > /tmp/condor-lock.phy-condor0.606384916537539/InstanceLock.
> > > 5/31 08:24:19 Obtained lock on 
> > > /tmp/condor-lock.phy-condor0.606384916537539/InstanceLock.
> > > 5/31 08:24:19 DaemonCore: Command Socket at 
> <xxx.xxx.xxx.50:32769> 
> > > 5/31 08:24:19 SEC_DEFAULT_SESSION_DURATION is undefined, using 
> > > default value of 3600 5/31 08:24:19 MASTER_TIMEOUT_MULTIPLIER is 
> > > undefined, using default value of 0
> > > 5/31 08:24:19 MASTER_TIMEOUT_MULTIPLIER is undefined, 
> using default
> > > value of 0
> > > 5/31 08:24:19 Will use UDP to update collector
> > > 5/31 08:24:19 Started DaemonCore process
> > > "/opt/condor-6.6.9/sbin/condor_collector", pid and pgroup = 2355
> > > 5/31 08:24:19 MASTER_TIMEOUT_MULTIPLIER is undefined, 
> using default
> > > value of 0
> > > 5/31 08:24:19 Started DaemonCore process
> > > "/opt/condor-6.6.9/sbin/condor_negotiator", pid and pgroup = 2356
> > > 5/31 08:24:19 Started DaemonCore process
> > > "/opt/condor-6.6.9/sbin/condor_startd", pid and pgroup = 2357
> > > 5/31 08:24:19 Started DaemonCore process
> > > "/opt/condor-6.6.9/sbin/condor_schedd", pid and pgroup = 2358
> > > 5/31 08:24:21 DaemonCore: Command received via UDP from host
> > > <xxx.xxx.xxx.50:32773>
> > > 5/31 08:24:21 DaemonCore: received command 60008 (DC_CHILDALIVE),
> > > calling handler (HandleChildAliveCommand)
> > > 5/31 08:24:21 DaemonCore: Command received via UDP from host
> > > <xxx.xxx.xxx.50:32773>
> > > 5/31 08:24:21 DaemonCore: received command 60008 (DC_CHILDALIVE),
> > > calling handler (HandleChildAliveCommand)
> > > 5/31 08:24:22 DaemonCore: Command received via UDP from host
> > > <xxx.xxx.xxx.50:32773>
> > > 5/31 08:24:22 DaemonCore: received command 60008 (DC_CHILDALIVE),
> > > calling handler (HandleChildAliveCommand)
> > > 5/31 08:24:24 enter Daemons::CheckForNewExecutable
> > > 5/31 08:24:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_master: 1110456335
> > > 5/31 08:24:24 GetTimeStamp returned: 1110456335
> > > 5/31 08:24:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_collector: 1110456335
> > > 5/31 08:24:24 GetTimeStamp returned: 1110456335
> > > 5/31 08:24:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_negotiator: 1110456334
> > > 5/31 08:24:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:24:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_startd: 1110456334
> > > 5/31 08:24:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:24:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_schedd: 1110456334
> > > 5/31 08:24:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:24:24 exit Daemons::CheckForNewExecutable
> > > 5/31 08:24:24 enter Daemons::UpdateCollector
> > > 5/31 08:24:24 Attempting to send update via UDP to collector
> > > 5/31 08:24:24 Can't send UPDATE_MASTER_AD to collector : Failed to
> > > connect to collector
> > > 5/31 08:24:33 DaemonCore: Command received via UDP from host
> > > <xxx.xxx.xxx.50:32773>
> > > 5/31 08:24:33 DaemonCore: received command 60008 (DC_CHILDALIVE),
> > > calling handler (HandleChildAliveCommand)
> > > 5/31 08:29:24 enter Daemons::UpdateCollector
> > > 5/31 08:29:24 Attempting to send update via UDP to collector
> > > 5/31 08:29:24 Can't send UPDATE_MASTER_AD to collector : Failed to
> > > connect to collector
> > > 5/31 08:29:24 enter Daemons::CheckForNewExecutable
> > > 5/31 08:29:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_master: 1110456335
> > > 5/31 08:29:24 GetTimeStamp returned: 1110456335
> > > 5/31 08:29:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_collector: 1110456335
> > > 5/31 08:29:24 GetTimeStamp returned: 1110456335
> > > 5/31 08:29:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_negotiator: 1110456334
> > > 5/31 08:29:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:29:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_startd: 1110456334
> > > 5/31 08:29:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:29:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_schedd: 1110456334
> > > 5/31 08:29:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:29:24 exit Daemons::CheckForNewExecutable
> > > 5/31 08:34:24 enter Daemons::CheckForNewExecutable
> > > 5/31 08:34:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_master: 1110456335
> > > 5/31 08:34:24 GetTimeStamp returned: 1110456335
> > > 5/31 08:34:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_collector: 1110456335
> > > 5/31 08:34:24 GetTimeStamp returned: 1110456335
> > > 5/31 08:34:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_negotiator: 1110456334
> > > 5/31 08:34:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:34:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_startd: 1110456334
> > > 5/31 08:34:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:34:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_schedd: 1110456334
> > > 5/31 08:34:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:34:24 exit Daemons::CheckForNewExecutable
> > > 5/31 08:34:24 enter Daemons::UpdateCollector
> > > 5/31 08:34:24 Attempting to send update via UDP to collector
> > > 5/31 08:34:24 Can't send UPDATE_MASTER_AD to collector : Failed to
> > > connect to collector
> > > 5/31 08:35:07 DaemonCore: Command received via TCP from host
> > > <xxx.xxx.xxx.50:32777>
> > > 5/31 08:35:07 DaemonCore: received command 453 (RESTART), calling
> > > handler (admin_command_handler)
> > > 5/31 08:35:07 Got admin command (453) and allowing it.
> > > 5/31 08:35:07 NumberOfChildren() returning 4
> > > 5/31 08:35:07 MASTER_TIMEOUT_MULTIPLIER is undefined, 
> using default
> > > value of 0
> > > 5/31 08:35:07 Sent SIGTERM to COLLECTOR (pid 2355)
> > > 5/31 08:35:07 MASTER_TIMEOUT_MULTIPLIER is undefined, 
> using default
> > > value of 0
> > > 5/31 08:35:07 Sent SIGTERM to NEGOTIATOR (pid 2356)
> > > 5/31 08:35:07 MASTER_TIMEOUT_MULTIPLIER is undefined, 
> using default
> > > value of 0
> > > 5/31 08:35:07 Sent SIGTERM to STARTD (pid 2357)
> > > 5/31 08:35:07 MASTER_TIMEOUT_MULTIPLIER is undefined, 
> using default
> > > value of 0
> > > 5/31 08:35:07 Sent SIGTERM to SCHEDD (pid 2358)
> > > 5/31 08:35:07 DaemonCore: No more children processes to reap.
> > > 5/31 08:35:07 The COLLECTOR (pid 2355) exited with status 0
> > > 5/31 08:35:07 ProcAPI::buildFamily failed: parent 2355 
> not found on
> > > system.
> > > 5/31 08:35:07 ProcAPI: pid 2355 does not exist.
> > > 5/31 08:35:07 NumberOfChildren() returning 3
> > > 5/31 08:35:07 The NEGOTIATOR (pid 2356) exited with status 0
> > > 5/31 08:35:07 ProcAPI::buildFamily failed: parent 2356 
> not found on
> > > system.
> > > 5/31 08:35:07 ProcAPI: pid 2356 does not exist.
> > > 5/31 08:35:07 NumberOfChildren() returning 2
> > > 5/31 08:35:07 DaemonCore: No more children processes to reap.
> > > 5/31 08:35:07 The STARTD (pid 2357) exited with status 0
> > > 5/31 08:35:07 ProcAPI::buildFamily failed: parent 2357 
> not found on
> > > system.
> > > 5/31 08:35:07 ProcAPI: pid 2357 does not exist.
> > > 5/31 08:35:07 NumberOfChildren() returning 1
> > > 5/31 08:35:07 The SCHEDD (pid 2358) exited with status 0
> > > 5/31 08:35:07 ProcAPI: pid 2418 does not exist.
> > > 5/31 08:35:07 ProcAPI::buildFamily failed: parent 2358 
> not found on
> > > system.
> > > 5/31 08:35:07 ProcAPI: pid 2358 does not exist.
> > > 5/31 08:35:07 NumberOfChildren() returning 0
> > > 5/31 08:35:07 All daemons are gone.  Restarting.
> > > 5/31 08:35:07 Restarting master right away.
> > > 5/31 08:35:07 Doing exec( "/opt/condor-6.6.9/sbin/condor_master" )
> > > 5/31 08:35:07 getExecPath: readlink("/proc/self/exe") 
> failed: errno 13
> > > (Permission denied)
> > >
> > > 5/31 08:35:07 PASSWD_CACHE_REFRESH is undefined, using 
> default value 
> > > of 300
> > >
> > > StartLog error:
> > >
> > > 5/31 09:05:37 Attempting to send update via UDP to collector 5/31 
> > > 09:05:37 Error sending update to the collector : Failed 
> to connect 
> > > to collector 5/31 09:05:37 Error sending update to collector(s)
> > >
> > > Negotiator Sample:
> > >
> > > 5/31 09:05:07 ---------- Started Negotiation Cycle 
> ---------- 5/31 
> > > 09:05:07 Phase 1:  Obtaining ads from collector ...
> > > 5/31 09:05:07   Getting all public ads ...
> > > 5/31 09:05:07 NEGOTIATOR_TIMEOUT_MULTIPLIER is undefined, using 
> > > default value of 0 5/31 09:05:07 Couldn't fetch ads: can't find 
> > > collector 5/31 09:05:07 Aborting negotiation cycle
> > >
> > > _______________________________________________
> > > Condor-users mailing list
> > > Condor-users@xxxxxxxxxxx 
> > > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > 
> > 
> > 
> > --
> > 
> >
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx 
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>