[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor fedora core 4 installation problem

I am a linux newbie and trying to install condor for my academic project.
I have been stuck with this problem for quite a while now and after trying to find out the cause for it,I have given up.
I installed condor 6.6.10 on fedora core 4 which is running on vmware workstation 5.0 on my laptop.
I have two copies(central manager and working nodes) of fedora core 4 running on windows(host) operating system and I installed condor on both.
I can ping and ssh both the central manager and working node from each other and they seem to be communicating well.
These are the steps I followed for installing condor  on master node( ----
 cd /usr/local/condor 6.6.10
./condor_configure --install --type=manager  --owner=condor
Then I set the condor_config environment variable to /usr/local/condor 6.6.10/etc/condor_config
I made the following changes to  condor_config.local file
START ,PREEMPT,SUSPEND,VACATE variables are set to true
Also i made changes to condor_config file
MEMORY = 512
Then i start the master daemon and do ps aux | egrep condor_.I can see the required daemons running.
I check the log files for error and everything is fine,there are no errors
./condor_status shows  me the central manager as available.
Now while installing condor on the worker node on the other fedora core 4  guest os ( ),I follow the same steps ,
except while configuring the worker node,i use the command
./condor_configure --install --type=submit,execute --central-manger=  --owner=condor
and remaining are the same steps.The startd,schedd and master daemons start properly.
But when I do ./condor_status.I get the following error message
CEDAR:6001:Failed to connect to <>
Error: Couldn't contact the condor_collector on

Extra Info: the condor_collector is a process that runs on the central
manager of your Condor pool and collects the status of all the machines
and jobs in the Condor pool. The condor_collector might not be running, it
might be refusing to communicate with you, there might be a network problem,
or there may be some other problem. Check with your system administrator to
fix this problem.
If you are the system administrator, check that the condor_collector is
running on, check the HOSTALLOW configuration in your
condor_config, and check the MasterLog and CollectorLog files in your
log directory for possible clues as to why the condor_collector is not
responding. Also see the Troubleshooting section of the manual.
Here are the log files on the working node are shown below

2/11 10:22:37 ******************************************************
2/11 10:22:37 ** condor_master (CONDOR_MASTER) STARTING UP
2/11 10:22:37 ** /usr/local/condor-6.6.10/sbin/condor_master
2/11 10:22:37 ** $CondorVersion: 6.6.10 Jun 13 2005 $
2/11 10:22:37 ** $CondorPlatform: I386-LINUX_RH9 $
2/11 10:22:37 ** PID = 2738
2/11 10:22:37 ******************************************************
2/11 10:22:37 Using config file:
2/11 10:22:37 Using local config files:
2/11 10:22:37 DaemonCore: Command Socket at <>
2/11 10:22:37 Started DaemonCore process
"/usr/local/condor-6.6.10/sbin/condor_schedd", pid and pgroup = 2739
2/11 10:22:37 Started DaemonCore process
"/usr/local/condor- 6.6.10/sbin/condor_startd", pid and pgroup = 2740
2/11 10:22:43 Can't connect to <>:0, errno = 113
2/11 10:22:43 Will keep trying for 10 seconds...
2/11 10:23:01 Connect failed for 10 seconds; returning FALSE
2/11 10:23:01 ERROR:
SECMAN:2003:TCP connection to <> failed

2/11 10:23:01 Can't send UPDATE_MASTER_AD to collector
<>:Failed to send UDP update command to collector
2/11 10:28:01 Can't connect to <>:0, errno = 113
2/11 10:28:01 Will keep trying for 10 seconds...
2/11 10:28:13 Connect failed for 10 seconds; returning FALSE
2/11 10:28:13 ERROR:
SECMAN:2003:TCP connection to < > failed
Any advice on this would really be helpful