[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Error when installing condor



Dear all:

 

I ‘m a new user of condor and this is my first time of installing it.

After I installed condor as a central manager then I use “./etc/condor_master” to start it.

I use “ps –ef| grep condor” to check the process :

 

condor    8394     1  0 13:49 ?        00:00:00 ./sbin/condor_master

condor    8395  8394  0 13:49 ?        00:00:05 condor_startd -f

condor    8396  8394  0 13:49 ?        00:00:00 condor_schedd -f

root      8398  8396  0 13:49 ?        00:00:00 condor_procd -A /data/condor/log/procd_pipe.SCHEDD -R 10000000 -S 60 -C 60000

 

I think it’s wrong! Then I check the log files(listed below), I find maybe there is something wrong with the TCP connection. All the three log files say:” TCP connection to <10.122.226.129:9618> failed.” However I have turn off my firewall. Could anybody help me to find where the error is and how to fix it?

 

 

MasterLog file is :

 

03/01/11 13:49:10 Setting maximum accepts per cycle 4.

03/01/11 13:49:10 ******************************************************

03/01/11 13:49:10 ** condor_master (CONDOR_MASTER) STARTING UP

03/01/11 13:49:10 ** /mnt/disk2/yw60175/WORK_1/JFT/Software/Condor/condor-7.5.5-x86_64_rhas_3-unstripped/sbin/condor_master

03/01/11 13:49:10 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)

03/01/11 13:49:10 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON

03/01/11 13:49:10 ** $CondorVersion: 7.5.5 Jan 26 2011 BuildID: 308936 $

03/01/11 13:49:10 ** $CondorPlatform: X86_64-LINUX_x86_64_rhas_3 $

03/01/11 13:49:10 ** PID = 8394

03/01/11 13:49:10 ** Log last touched time unavailable (No such file or directory)

03/01/11 13:49:10 ******************************************************

03/01/11 13:49:10 Using config source: /mnt/disk2/yw60175/WORK_1/JFT/Software/Condor/condor-7.5.5-x86_64_rhas_3-unstripped/etc/condor_config

03/01/11 13:49:10 Using local config sources:

03/01/11 13:49:10    /data/condor/condor_config.local

03/01/11 13:49:10 DaemonCore: command socket at <10.122.226.129:48922>

03/01/11 13:49:10 Setting maximum accepts per cycle 4.

03/01/11 13:49:10 Started DaemonCore process "/mnt/disk2/yw60175/WORK_1/JFT/Software/Condor/condor-7.5.5-x86_64_rhas_3-unstripped/sbin/condor_startd", pid and pgroup = 8395

03/01/11 13:49:10 Started DaemonCore process "/mnt/disk2/yw60175/WORK_1/JFT/Software/Condor/condor-7.5.5-x86_64_rhas_3-unstripped/sbin/condor_schedd", pid and pgroup = 8396

03/01/11 13:49:15 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).

03/01/11 13:49:15 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.

|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.

03/01/11 13:49:15 Failed to start non-blocking update to <10.122.226.129:9618>.

03/01/11 13:54:15 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).

03/01/11 13:54:15 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.

|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.

03/01/11 13:54:15 Failed to start non-blocking update to <10.122.226.129:9618>.

03/01/11 13:59:15 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).

03/01/11 13:59:15 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.

|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.

03/01/11 13:59:15 Failed to start non-blocking update to <10.122.226.129:9618>.

03/01/11 14:04:15 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).

03/01/11 14:04:15 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.

|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.

03/01/11 14:04:15 Failed to start non-blocking update to <10.122.226.129:9618>.

 

SchedLog file is :

 

03/01/11 13:49:10 (pid:8396) Setting maximum accepts per cycle 4.

03/01/11 13:49:10 (pid:8396) ******************************************************

03/01/11 13:49:10 (pid:8396) ** condor_schedd (CONDOR_SCHEDD) STARTING UP

03/01/11 13:49:10 (pid:8396) ** /mnt/disk2/yw60175/WORK_1/JFT/Software/Condor/condor-7.5.5-x86_64_rhas_3-unstripped/sbin/condor_schedd

03/01/11 13:49:10 (pid:8396) ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1)

03/01/11 13:49:10 (pid:8396) ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON

03/01/11 13:49:10 (pid:8396) ** $CondorVersion: 7.5.5 Jan 26 2011 BuildID: 308936 $

03/01/11 13:49:10 (pid:8396) ** $CondorPlatform: X86_64-LINUX_x86_64_rhas_3 $

03/01/11 13:49:10 (pid:8396) ** PID = 8396

03/01/11 13:49:10 (pid:8396) ** Log last touched time unavailable (No such file or directory)

03/01/11 13:49:10 (pid:8396) ******************************************************

03/01/11 13:49:10 (pid:8396) Using config source: /mnt/disk2/yw60175/WORK_1/JFT/Software/Condor/condor-7.5.5-x86_64_rhas_3-unstripped/etc/condor_config

03/01/11 13:49:10 (pid:8396) Using local config sources:

03/01/11 13:49:10 (pid:8396)    /data/condor/condor_config.local

03/01/11 13:49:10 (pid:8396) DaemonCore: command socket at <10.122.226.129:48924>

03/01/11 13:49:10 (pid:8396) Setting maximum accepts per cycle 4.

03/01/11 13:49:10 (pid:8396) History file rotation is enabled.

03/01/11 13:49:10 (pid:8396)   Maximum history file size is: 20971520 bytes

03/01/11 13:49:10 (pid:8396)   Number of rotated history files is: 2

03/01/11 13:49:16 (pid:8396) attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).

03/01/11 13:49:16 (pid:8396) ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.

|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.

03/01/11 13:49:16 (pid:8396) Failed to start non-blocking update to <10.122.226.129:9618>.

03/01/11 13:54:17 (pid:8396) attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).

03/01/11 13:54:17 (pid:8396) ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.

|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.

03/01/11 13:54:17 (pid:8396) Failed to start non-blocking update to <10.122.226.129:9618>.

03/01/11 13:59:18 (pid:8396) attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).

03/01/11 13:59:18 (pid:8396) ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.

|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.

03/01/11 13:59:18 (pid:8396) Failed to start non-blocking update to <10.122.226.129:9618>.

03/01/11 14:04:19 (pid:8396) attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).

03/01/11 14:04:19 (pid:8396) ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.

|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.

03/01/11 14:04:19 (pid:8396) Failed to start non-blocking update to <10.122.226.129:9618>.

 

StartLog file is :

 

03/01/11 13:49:10 Setting maximum accepts per cycle 4.

03/01/11 13:49:10 ******************************************************

03/01/11 13:49:10 ** condor_startd (CONDOR_STARTD) STARTING UP

03/01/11 13:49:10 ** /mnt/disk2/yw60175/WORK_1/JFT/Software/Condor/condor-7.5.5-x86_64_rhas_3-unstripped/sbin/condor_startd

03/01/11 13:49:10 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1)

03/01/11 13:49:10 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON

03/01/11 13:49:10 ** $CondorVersion: 7.5.5 Jan 26 2011 BuildID: 308936 $

03/01/11 13:49:10 ** $CondorPlatform: X86_64-LINUX_x86_64_rhas_3 $

03/01/11 13:49:10 ** PID = 8395

03/01/11 13:49:10 ** Log last touched time unavailable (No such file or directory)

03/01/11 13:49:10 ******************************************************

03/01/11 13:49:10 Using config source: /mnt/disk2/yw60175/WORK_1/JFT/Software/Condor/condor-7.5.5-x86_64_rhas_3-unstripped/etc/condor_config

03/01/11 13:49:10 Using local config sources:

03/01/11 13:49:10    /data/condor/condor_config.local

03/01/11 13:49:10 DaemonCore: command socket at <10.122.226.129:48923>

03/01/11 13:49:10 Setting maximum accepts per cycle 4.

03/01/11 13:49:16 VM-gahp server reported an internal error

03/01/11 13:49:16 VM universe will be tested to check if it is available

03/01/11 13:49:16 History file rotation is enabled.

03/01/11 13:49:16   Maximum history file size is: 20971520 bytes

03/01/11 13:49:16   Number of rotated history files is: 2

03/01/11 13:49:16 slot1: New machine resource allocated

03/01/11 13:49:16 slot2: New machine resource allocated

03/01/11 13:49:16 About to run initial benchmarks.

03/01/11 13:49:21 Completed initial benchmarks.

03/01/11 13:49:25 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).

03/01/11 13:49:25 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.

|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.

03/01/11 13:49:25 Failed to start non-blocking update to <10.122.226.129:9618>.

03/01/11 13:49:26 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).

03/01/11 13:49:26 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.

|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.

03/01/11 13:49:26 Failed to start non-blocking update to <10.122.226.129:9618>.

03/01/11 13:54:25 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).

03/01/11 13:54:25 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.

|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.

03/01/11 13:54:25 Failed to start non-blocking update to <10.122.226.129:9618>.

03/01/11 13:54:26 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).

03/01/11 13:54:26 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.

|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.

03/01/11 13:54:26 Failed to start non-blocking update to <10.122.226.129:9618>.

03/01/11 13:59:25 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).

03/01/11 13:59:25 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.

|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.

03/01/11 13:59:25 Failed to start non-blocking update to <10.122.226.129:9618>.

03/01/11 13:59:26 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).

03/01/11 13:59:26 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.

|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.

03/01/11 13:59:26 Failed to start non-blocking update to <10.122.226.129:9618>.

03/01/11 14:04:25 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).

03/01/11 14:04:25 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.

|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.

03/01/11 14:04:25 Failed to start non-blocking update to <10.122.226.129:9618>.

03/01/11 14:04:26 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).

03/01/11 14:04:26 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.

|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.

03/01/11 14:04:26 Failed to start non-blocking update to <10.122.226.129:9618>.