[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] condor_collector problem with 6.6.0 under IRIX



It seems that your Master doesn't know that it should start COLLECTOR
and Negotiator. Check that your local configuration file contains
COLLECTOR and NEGOTIATOR in the DAEMON_LIST 
Mark
On Fri, 2003-11-21 at 17:32, Mark Calleja wrote:
> Hi chaps,
> 
> May I take this opportunity to further display my ignorance in this
> forum. I'm upgrading our 6.4.5 pool to 6.6.0, with the master node being
> an SGI O2 running IRIX 6.5, which has given valiant service in this role
> with 6.4.5 for nearly a year. The upgrade installation goes swimmingly,
> but on issuing condor_master all the relevant daemons come up except
> condor_collector, so running condor_q -global gives the relevant error
> message. The MasterLog has the following entry:
> 
> 
> 11/21 14:55:02 ******************************************************
> 11/21 14:55:02 ** condor_master (CONDOR_MASTER) STARTING UP
> 11/21 14:55:02 ** $CondorVersion: 6.6.0 Nov 14 2003 $
> 11/21 14:55:02 ** $CondorPlatform: SGI-IRIX65 $
> 11/21 14:55:02 ** PID = 277284
> 11/21 14:55:02 ******************************************************
> 11/21 14:55:02 Using config file:
> /pond/home/condor/IRIX/current_release/etc/condor_config
> 11/21 14:55:02 Using local config files: /usr/condor/condor_config.local
> 11/21 14:55:02 DaemonCore: Command Socket at <131.111.41.187:9633>
> 11/21 14:55:02 Started DaemonCore process
> "/home/condor/IRIX/current_release/sbin/condor_startd", pid and pgroup =
> 278251
> 11/21 14:55:02 Started DaemonCore process
> "/home/condor/IRIX/current_release/sbin/condor_schedd", pid and pgroup =
> 278172
> 11/21 14:55:02 Started DaemonCore process
> "/home/condor/IRIX/current_release/sbin/condor_kbdd", pid and pgroup =
> 276100
> 11/21 14:55:07 Can't connect to <131.111.41.187:9618>:0, errno = 146
> 11/21 14:55:07 Will keep trying for 10 seconds...
> 11/21 14:55:17 Connect failed for 10 seconds; returning FALSE
> 11/21 14:55:17 ERROR:
> SECMAN:2003:TCP connection to <131.111.41.187:9618> failed
> 
> 
> Now an errno of 146 maps to ECONNREFUSED according to <sys/errno.h>, so
> on the chance that this port's already been bagged by some other
> application I ran netstat -a before and after starting condor and looked
> at which ports are being used. Port 9618 is not being used at all, and
> the only effect of running condor is to use the following ports:
> 
> > tcp     0      0  silica.9633        *.*                 LISTEN
> > tcp     0      0  silica.9657        *.*                 LISTEN
> > tcp     0      0  silica.9617        *.*                 LISTEN
> > tcp     0      0  silica.9609        *.*                 LISTEN
> > tcp     0      0  silica.9610        *.*                 LISTEN
> > udp     0      0  silica.9603        *.*
> > udp     0      0  silica.9609        *.*
> > udp     0      0  silica.9610        *.*
> > udp     0      0  silica.9617        *.*
> > udp     0      0  silica.9633        *.*
> > udp     0      0  silica.9657        *.*
> 
> Not only is port 9618 not been used, but I can't even see port 9614
> being taken by the negotiator. I should point out that I haven't altered
> the port ranges that condor uses from the defaults.
> 
> Can any of you chaps spot anything that I'm obviously missing?
> 
> Thanks for any help,
> 
> Mark
> 
> 
> 
> Condor Support Information:
> http://www.cs.wisc.edu/condor/condor-support/
> To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
> unsubscribe condor-users <your_email_address>

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>