[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Troubleshooting process



On Mon, 1 Mar 2004 kge2@xxxxxxxx wrote:

> I am attempting to troubleshoot a new condor installation.  How do I verify that
> the server is acting as a central manager of type "submit, manager" (ie, no
> execute) and is accepting execute clients on the nic having an ip address of
> 172.16.0.1 (it has two nics)?
>
> I have set it up accordingly and followed the steps for a dual-nic setup, but I
> want to somehow verify that the server is working as it should so that I can
> then start troubleshooting the clients.

There are several ways to see if a machine is set to execute jobs in your
Condor pool:

* Run condor_status and see if the hostname appears in the output. If the
hostname doesn't appear, then Condor isn't aware of it as an execute node.

* On the machine, run ps and look for any process named condor_startd.
condor_startd is the daemon that makes a machine an execute node.

* On the machine, run condor_config_val -master DAEMON_LIST and see if
"STARTD" appears in the results. This will tell you if Condor is
configured to run the condor_startd daemon. You can also look in the
config file (which is where you'd change DAEMON_LIST if STARTD is listed).

* On the machine, look for a file StartLog in the Condor log directory. If
it's present and has recent entries in it, the condor_startd is probably
running.

As for what interface the Condor daemons are using, every Condor daemon
writes something like the following to its log when it starts:

12/31 16:58:38 ******************************************************
12/31 16:58:38 ** condor_master (CONDOR_MASTER) STARTING UP
12/31 16:58:38 ** $CondorVersion: 6.6.1 Dec 30 2003 RH9-BRANCH-PRE-RELEASE $
12/31 16:58:38 ** $CondorPlatform: I386-LINUX-RH9 $
12/31 16:58:38 ** PID = 7125
12/31 16:58:38 ******************************************************
12/31 16:58:38 Using config file: /some/path/name
12/31 16:58:38 Using local config files: /some/other/path/name
12/31 16:58:38 DaemonCore: Command Socket at <128.105.111.110:32873>

That last line tells you the ip:port the daemon is listening on. All
outgoing connections will be made on the same network interface. If it
reads 127.0.0.1, you're going to have problems. :-)

+------------------------------------+-------------------------------+
|             Jaime Frey             |There are 10 types of people in|
|         jfrey@xxxxxxxxxxx          |the world: Those who understand|
|   http://www.cs.wisc.edu/~jfrey/   |  binary, and those who don't  |
+------------------------------------+-------------------------------+
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>