[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] No collector -- no connection to 9618

Greetings. I recently installed Condor on a system running Redhat Enterprise Linux server, version 3. I used the install script, rather than the RPM.

Everything seemed to install properly, so I went to the "...now what?" section of the manual. I immediately noticed that the collector and negotiator processes were NOT running on the master server.

There are numerous error messages related to this, but they all seem to come down to :

Can't connect to <169.237.mm.nn:9618>:0, errno = 111

where "169.237.mm.nn" is the IP address of the master server, of course. See the appended segment of the Master log file, for example.

I've done google searches for this and have found a number of instances of the problem. I've tried to follow all of the suggestions that I found, but so far nothing has helped.

The problem doesn't seem to be a firewall issue, nor a hosts.deny issue, as I briefly disabled both of them while trying to start Condor.

The only thing that may be slightly unusual is that I installed Condor into /usr/local/condor, then had to move that directory to another partition in order to save space on the original. There is still a symlink.

The config file seems to be in a place such that Condor can find it. The Condor binaries are in the "condor" user's path. HOSTALLOW access is granted to every address in our subnet. The master server's IP address is available both in /etc/hosts and via DNS. I've tried making the condor user the owner of all the files in /usr/local/condor/..., so as to eliminate file-access problems.

If you can think of something else I should be looking at, please let me know.


					- Mike

1/31 19:19:46 ******************************************************
1/31 19:19:46 ** condor_master (CONDOR_MASTER) STARTING UP
1/31 19:19:46 ** /scratch/condor/sbin/condor_master
1/31 19:19:46 ** $CondorVersion: 6.6.7 Oct 11 2004 $
1/31 19:19:46 ** $CondorPlatform: I386-LINUX_RH9 $
1/31 19:19:46 ** PID = 21713
1/31 19:19:46 ******************************************************
1/31 19:19:46 Using config file: /home/condor/condor_config
1/31 19:19:46 Using local config files: /home/condor/hosts/<master>/condor_config.local
1/31 19:19:46 DaemonCore: Command Socket at <169.237.mm.nn:40295>
1/31 19:19:46 Started DaemonCore process "/usr/local/condor/sbin/condor_startd", pid and pgroup
= 21714
1/31 19:19:46 Started DaemonCore process "/usr/local/condor/sbin/condor_schedd", pid and pgroup
= 21715
1/31 19:19:51 Can't connect to <169.237.mm.nn:9618>:0, errno = 111
1/31 19:19:51 Will keep trying for 10 seconds...
1/31 19:20:01 Connect failed for 10 seconds; returning FALSE
1/31 19:20:01 ERROR:
SECMAN:2003:TCP connection to <169.237.mm.nn:9618> failed

1/31 19:20:01 Can't send UPDATE_MASTER_AD to collector <master>.physics.ucdavis.edu <169.237.mm.nn>

Michael Hannon            mailto:hannon@xxxxxxxxxxxxxxxxxxx
Dept. of Physics          530.752.4966
University of California  530.752.4717 FAX
Davis, CA 95616-8677