[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor central-manager not detecting the other machines in the pool



Hey!

I just installed condor on three of my machines, with one as the
central-manager (--type=submit,execute,manager) and the rest of the
two as execute and submitters (--type=submit,execute). The problems
is, my central manager doesn't seem to know of the other two machines.
After running condor_status on the central-manager, it only shows one
machine to be in the pool.

On the machines other than central-manager, I am surprised to see that
condor-master has also initiated condor_collector and
condor_negotiator daemons as well. (with the -f flag on all the
machines). Does this show some sort of a problem?

Other than this, here is a copy-past from Masterlog on one of the
submitter machine.

==

8/5 17:39:47 ******************************************************
8/5 17:39:47 ** condor_master (CONDOR_MASTER) STARTING UP
8/5 17:39:47 ** /root/condor/condor-6.8.0/sbin/condor_master
8/5 17:39:47 ** $CondorVersion: 6.8.0 Jul 19 2006 $
8/5 17:39:47 ** $CondorPlatform: I386-LINUX_RHEL3 $
8/5 17:39:47 ** PID = 4219
8/5 17:39:47 ** Log last touched 8/5 17:35:49
8/5 17:39:47 ******************************************************
8/5 17:39:47 Using config source: /root/condor/condor-6.8.0/etc/condor_config
8/5 17:39:47 Using local config sources:
8/5 17:39:47    /home/condor/condor_config.local
8/5 17:39:47 DaemonCore: Command Socket at <*my-system-ip*:32770>
8/5 17:39:47 Warning: attempting to compare null hostnames in same_host.
8/5 17:39:47 Collector port not defined, will use default: 9618
8/5 17:39:47 Started DaemonCore process
"/root/condor/condor-6.8.0/sbin/condor_collector", pid and pgroup =
4220
8/5 17:39:47 Started DaemonCore process
"/root/condor/condor-6.8.0/sbin/condor_negotiator", pid and pgroup =
4221
8/5 17:39:47 Started DaemonCore process
"/root/condor/condor-6.8.0/sbin/condor_schedd", pid and pgroup = 4222
8/5 17:39:47 Started DaemonCore process
"/root/condor/condor-6.8.0/sbin/condor_startd", pid and pgroup = 4223
8/5 17:40:13 attempt to connect to <*my-system-ip*:32780> timed out
8/5 17:40:13 ERROR: SECMAN:2003:TCP connection to <*my-system-ip*:32780> failed

8/5 17:40:13 Failed to start non-blocking update to <*my-system-ip*:32775>.
8/5 17:45:13 attempt to connect to <*my-system-ip*:32787> timed out
8/5 17:45:13 ERROR: SECMAN:2003:TCP connection to <*my-system-ip*:32787> failed

8/5 17:45:13 Failed to start non-blocking update to <*my-system-ip*:32779>.
8/5 17:50:13 attempt to connect to <*my-system-ip*:32791> timed out
8/5 17:50:13 ERROR: SECMAN:2003:TCP connection to <*my-system-ip*:32791> failed

8/5 17:50:13 Failed to start non-blocking update to <*my-system-ip*:32781>.
8/5 17:55:13 attempt to connect to <*my-system-ip*:32796> timed out
8/5 17:55:13 ERROR: SECMAN:2003:TCP connection to <*my-system-ip*:32796> failed

8/5 17:55:13 Failed to start non-blocking update to <*my-system-ip*:32783>.
8/5 18:00:13 attempt to connect to <*my-system-ip*:32802> timed out
8/5 18:00:13 ERROR: SECMAN:2003:TCP connection to <*my-system-ip*:32802> failed

8/5 18:00:13 Failed to start non-blocking update to <*my-system-ip*:32787>.

==

I have changed my system ip in the above logs to *my-system-ip* for
security purposes.

Any ideas as to what is wrong?

Thanks for your time.