[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Trouble on WinXP with master node



I have successfully setup Condor on 2 WinXP clusters.  After adding a router to each cluster, one of them stopped working so well.  The fellow who setup the computers initially had disabled several services and after much time spent I asked him to do a fresh reinstall, which he did (including SP2).  I have disabled the firewall to avoid those issues for now (it's an isolated LAN).  But I am getting errors I have not encountered before and nothing is working.  In particular I am getting errors in the masterlog on condor_read() and condor_write().  I have attached excerpts from the masterlog and collectorlog and the condor_config file.  Any help will be appreciated, thanks.  

In the config file, note that "node15" is the master node and the computer these files are from.

This happened with version 6.6.5 and 6.6.8.

Thanks
Brad

>From the maserlog:

2/1 13:32:40 ******************************************************
2/1 13:32:40 ** Condor (CONDOR_MASTER) STARTING UP
2/1 13:32:40 ** C:\Condor\bin\condor_master.exe
2/1 13:32:40 ** $CondorVersion: 6.6.8 Jan 31 2005 $
2/1 13:32:40 ** $CondorPlatform: INTEL-WINNT40 $
2/1 13:32:40 ** PID = 2096
2/1 13:32:40 ******************************************************
2/1 13:32:40 Using config file: C:\Condor\condor_config
2/1 13:32:40 Using local config files: C:\Condor/condor_config.local
2/1 13:32:40 DaemonCore: Command Socket at <192.168.1.50:1125>
2/1 13:32:40 Started DaemonCore process "C:\Condor/bin/condor_collector.exe", pid and pgroup = 2108
2/1 13:32:40 Started DaemonCore process "C:\Condor/bin/condor_negotiator.exe", pid and pgroup = 2120
2/1 13:32:40 Started DaemonCore process "C:\Condor/bin/condor_startd.exe", pid and pgroup = 2124
2/1 13:32:40 Started DaemonCore process "C:\Condor/bin/condor_schedd.exe", pid and pgroup = 2144
2/1 13:33:05 condor_read(): timeout reading buffer.

>From the collectorlog

2/1 13:32:40 ******************************************************
2/1 13:32:40 ** condor_collector.exe (CONDOR_COLLECTOR) STARTING UP
2/1 13:32:40 ** C:\Condor\bin\condor_collector.exe
2/1 13:32:40 ** $CondorVersion: 6.6.8 Jan 31 2005 $
2/1 13:32:40 ** $CondorPlatform: INTEL-WINNT40 $
2/1 13:32:40 ** PID = 2108
2/1 13:32:40 ******************************************************
2/1 13:32:40 Using config file: C:\Condor\condor_config
2/1 13:32:40 Using local config files: C:\Condor/condor_config.local
2/1 13:32:40 DaemonCore: Command Socket at <192.168.1.50:9618>
2/1 13:32:40 In ViewServer::Init()
2/1 13:32:40 In CollectorDaemon::Init()
2/1 13:32:40 In ViewServer::Config()
2/1 13:32:40 In CollectorDaemon::Config()
2/1 13:32:55 enable: Creating stats hash table
2/1 13:33:05 (Sent 0 ads in response to query)
2/1 13:33:05 WARNING:  No master ad for < node15 >
2/1 13:33:05 ScheddAd     : Inserting ** "< node15 , 192.168.1.50 >"
2/1 13:33:05 stats: Inserting new hashent for 'Schedd':'node15':'192.168.1.50'
2/1 13:33:05 condor_write(): Socket closed when trying to write buffer
2/1 13:33:05 Buf::write(): condor_write() failed
2/1 13:33:05 SECMAN: Error sending response classad!
2/1 13:33:20 WARNING:  No master ad for < node07 >
2/1 13:33:20 ScheddAd     : Inserting ** "< node07 , 192.168.1.57 >"
2/1 13:33:20 stats: Inserting new hashent for 'Schedd':'node07':'192.168.1.57'
2/1 13:33:20 condor_write(): Socket closed when trying to write buffer
2/1 13:33:20 Buf::write(): condor_write() failed
2/1 13:33:20 SECMAN: Error sending response classad!
2/1 13:33:20 ** Master < node15 > rejuvenated from recently down
2/1 13:33:20 stats: Inserting new hashent for 'Master':'node15':'192.168.1.50'
2/1 13:33:20 condor_write(): Socket closed when trying to write buffer
2/1 13:33:20 Buf::write(): condor_write() failed
2/1 13:33:20 SECMAN: Error sending response classad!
2/1 13:33:20 ERROR: DC_AUTHENTICATE unable to receive auth_info!
2/1 13:33:20 ERROR: DC_AUTHENTICATE unable to receive auth_info!
2/1 13:33:20 ERROR: DC_AUTHENTICATE unable to receive auth_info!
2/1 13:33:20 Got QUERY_STARTD_PVT_ADS
2/1 13:33:20 (Sent 0 ads in response to query)