[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor installation



Hello,
I have installed condor on three servers, I have one manager,submit,execute host and two submit,execute host.
All the deamons work with no errors reported on the log files.  but the submit,execute nodes only report the master node's CPUs when I type condor_status.

The same is true if I type condor_status on the master.

If I type condor_status -direct host, this will work when I mention the localhost, but it will failed with the error:

#################################################################################
CEDAR:6001:Failed to connect to <152.15.98.25:52357>
Error: Couldn't contact the condor_collector on <152.15.98.25:52357>.

Extra Info: the condor_collector is a process that runs on the central
manager of your Condor pool and collects the status of all the machines and
jobs in the Condor pool. The condor_collector might not be running, it might
be refusing to communicate with you, there might be a network problem, or
there may be some other problem. Check with your system administrator to fix
this problem.

If you are the system administrator, check that the condor_collector is
running on <152.15.98.25:52357>, check the HOSTALLOW configuration in your
condor_config, and check the MasterLog and CollectorLog files in your log
directory for possible clues as to why the condor_collector is not
responding. Also see the Troubleshooting section of the manual.
#################################################################################

If I try to explicitly view the status of the master

There is no errors reported on the MasterLog or CollectorLog file, but I don't know what I should be looking for.

I used the "newer" installation setup, and I don't have a common shared file system or a common UID system.

What could it be ??

I have opened port 9618, do I need any other ports opened ??


I get this error from SheddLog on the slave nodes

#################################################################################
12/22 21:27:27 (pid:31214) Failed to start non-blocking update to <152.15.98.25:9618>.
12/22 21:32:27 (pid:31214) attempt to connect to < 152.15.98.25:9618> failed: Connection refused (connect errno = 111).
12/22 21:32:27 (pid:31214) ERROR: SECMAN:2003:TCP connection to <152.15.98.25:9618> failed
#################################################################################

thanks for any help