[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Can't find address of local schedd



Marcelo,
The errors you are getting could be caused by a few problems, so below
is a more detailed process to help you debug this:
> $ condor_status
> CEDAR:6001:Failed to connect to <xxx.xx.xxx.xx:xxxx>
> Error: Couldn't contact the condor_collector on cluster-name.domain
>
> Extra Info: the condor_collector is a process that runs on the central
...
> responding. Also see the Troubleshooting section of the manual.

This error indicates that the condor_status command couldn't
communicate with the collector. This most likely means:
(1) the collector (and the condor_master/other daemons) isn't running
on the central manager,
(2) the collector is running, but not on the server the command thinks
it is, or
(3) the collector is running where condor_status thinks it is, but
condor_status doesn't have permission to talk with it.

To rule out #1, on the central manager of the pool, after you run
condor_master on the head node for the cluster, what do you get when
you run:
$ ps -ef | grep condor
Does the condor_master/condor_collector show up here?
This should tell you the directory log files are located in:
$ condor_config_val -config -verbose LOG

To check for option #2, determine where the collector should be by running:
condor_config_val -verbose COLLECTOR_HOST
Does this match the machine you expect to be the central manager?

> I am looking for the Masterlog files, but I can't find them. Where
> they are suppose to be? The troubleshooting section of the manual
> doesn't help.
The master log is located:
condor_config_val MASTER_LOG

> The condor_master command doesn't help too:
>
> # condor_master
condor_master merely starts the condor_master daemon, which on the
central manager for the pool (see the COLLECTOR_HOST setting), should
start the collector and other daemons.
For situation #3, do you get permission denied errors in the logfiles?
Checking the HOSTALLOW_READ settings on the central manager will be
the next step:
http://www.cs.wisc.edu/condor/manual/v7.2/3_6Security.html#sec:Host-Security

For further help you can also set TOOL_DEBUG = D_FULLDEBUG and run
condor_status -debug.

Good luck, and I hope this helps.

Best,
Jason

-- 
===================================
Jason A. Stowe
main: 888.292.5320

http://www.cyclecloud.com
http://www.cyclecomputing.com

Cycle Computing, LLC
Leader in Condor Grid Solutions
Enterprise Condor Support and Management Tools

Come see us at Bio-IT World in Boston!