[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor status

We're also having similar communication problem. The pool machine (also Fedora 4) cannot seem to communicate with the collector due to getpeername failure. This can be observed from StartLog:

Attempting to send update via UDP to collector supaman.cs.ucl.ac.uk <>
3/16 23:40:25 (fd:6) (pid:7925) SECMAN: need to start a session via TCP
3/16 23:40:25 (fd:6) (pid:7925) SEC_TCP_SESSION_TIMEOUT is undefined, using default value of 20
3/16 23:40:25 (fd:6) (pid:7925) SECMAN: setting timeout to 20 seconds.
3/16 23:40:25 (fd:7) (pid:7925) PRIV_CONDOR --> PRIV_ROOT at sock.C:506
3/16 23:40:25 (fd:7) (pid:7925) PRIV_ROOT --> PRIV_CONDOR at sock.C:512
3/16 23:40:25 (fd:7) (pid:7925) getpeername failed so connect must have failed
3/16 23:40:26 (fd:7) (pid:7925) PRIV_CONDOR --> PRIV_ROOT at sock.C:506

Since we don't have access to source sock.C, we are lacking of information in resolving this. Any ideas?

----- Original Message ----- From: "Nick LeRoy" <nleroy@xxxxxxxxxxx>
To: <condor-users@xxxxxxxxxxx>
Sent: Thursday, March 16, 2006 4:58 PM
Subject: Re: [Condor-users] condor status

On Thursday 16 March 2006 9:49 am, hicham rahaliii wrote:
i have in my condor pool 5 machines all of them have a condor 6.6.10
but when i run condor_status i see no machine
what can i do ?

Sigh, this should be in a FAQ.

There are a lot of things that could be causing this... Here's a quick check
list on where to start:

1. Is condor actually running on these machines (try: "ps auxww|grep
condor_" )?

2. Is condor_collector running on your central manager (again, verify with
"ps").  I suspect that it is because you'd have seen an error from
condor_status otherwise...

3. Is condor_startd running on the pool machines?
If not:
 b) Look in StartLog
 c) Look in MasterLog

4. Is CONDOR_HOST set properly on the pool machines?
 a) condor_config_val CONDOR_HOST
 b) Look in StartLog

think for your help......

Hopefully, this will be enough to get you started


          <<< Why, oh, why, didn't I take the blue pill? >>>
/`-_    Nicholas R. LeRoy               The Condor Project
{     }/ http://www.cs.wisc.edu/~nleroy  http://www.cs.wisc.edu/condor
\    /  nleroy@xxxxxxxxxxx              The University of Wisconsin
|_*_|   608-265-5761                    Department of Computer Sciences
Condor-users mailing list