[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] New User: Condor_status problem



Hi,
I've just installed condor (6.8.5) on a small test network of 2 PCs (both on Windows XP). PC1 is the central manager and only submits jobs whilst PC2 executes them. When I run 'condor_status', I only see PC2 in the pool. Running 'condor_status -direct <PC1_name>' gives me the error :
"CEDAR:6001:Failed to connect to <192.168.xx.xxx:2907>"
"Error: Couldn't contact the condor_collector on <192.168.xx.xxx:2907> Extra Info...." . It seems strange to me that the collector (running on PC1) is communicating with PC2, but not with PC1 itself.

I have checked that the condor_collector is running (which it is). I have turned off the firewalls on both PCs, tried manually enabling the port 2907 on PC1, checked that the DAEMON_LIST in my condor_config file contains all the daemons (master, collector, negotiator, schedd, - and startd for PC2), checked the COLLECTOR_HOST (which is set to $(CONDOR_HOST) i.e. PC1), HOSTALLOW_READ/WRITE all contain PC1, and still no good. Also checked the log files. Last few lines from CollectorLog and MasterLog are attached at end.

Also, after I do a 'condor_restart', executing 'condor_status' just echos a blank line (on both PC1 and PC2) for about 15minutes or so, after which it reverts to showing me PC2 in the pool.

I have read and re-read the manual and the user-lists and can't find any answers. Would really appreciate any ideas on what I may be doing wrong.

Many thanks in advance.

Bobane
Bristol -UK.
.................................

NegotiatorAd : Inserting ** "< PC1.mydomain >"
7/27 17:28:38 DC_AUTHENTICATE: attempt to open invalid session PC1:3716:xxxxxxxxxx:11, failing.
7/27 17:32:46 (Sending 7 ads in response to query)
7/27 17:32:46 Got QUERY_STARTD_PVT_ADS
7/27 17:32:46 (Sending 1 ads in response to query)
7/27 17:32:46 NegotiatorAd : Inserting ** "< PC1.mydomain>"
7/27 17:33:17 Got QUERY_STARTD_ADS
7/27 17:33:17 (Sending 1 ads in response to query)
7/27 17:33:37 ** Master < mesh1 > rejuvenated from recently down
7/27 17:33:37 stats: Inserting new hashent for 'Master':'PC2':'192.168.xx.xxx'
7/27 17:37:46 (Sending 8 ads in response to query)
7/27 17:37:46 Got QUERY_STARTD_PVT_ADS
7/27 17:37:46 (Sending 1 ads in response to query)
7/27 17:37:46 NegotiatorAd : Inserting ** "< PC1.mydomain >"

And from Masterlog:

7/27 17:14:01 ******************************************************
7/27 17:14:01 ** condor_master (CONDOR_MASTER) STARTING UP
7/27 17:14:01 ** C:\condor\bin\condor_master.exe
7/27 17:14:01 ** $CondorVersion: 6.8.5 May 17 2007 $
7/27 17:14:01 ** $CondorPlatform: INTEL-WINNT50 $
7/27 17:14:01 ** PID = 1204
7/27 17:14:01 ** Log last touched 7/27 17:13:46
7/27 17:14:01 ******************************************************
7/27 17:14:01 Using config source: C:\condor\condor_config
7/27 17:14:01 Using local config sources:
7/27 17:14:01 C:\condor/condor_config.local
7/27 17:14:01 DaemonCore: Command Socket at <192.168.0.xxx>
7/27 17:14:01 Started DaemonCore process "C:\condor/bin/condor_collector.exe", pid and pgroup = 2204 7/27 17:14:01 Started DaemonCore process "C:\condor/bin/condor_negotiator.exe", pid and pgroup = 1332 7/27 17:14:01 Started DaemonCore process "C:\condor/bin/condor_schedd.exe", pid and pgroup = 2544
BEGIN:VCARD
VERSION:2.1
N:Tameh;Eustace K
FN:Eustace K Tameh
EMAIL;PREF;INTERNET:Tek.Tameh@xxxxxxxxxxxxx
REV:20070727T165454Z
END:VCARD