[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] VB: [Condor] Problem o2f-sth-lap-016.un.dr.dgcsystems.net: condor_schedd.exe exited (4)



Hi!

 

I am trying to configure Condor on a linux cluster consisting of 13 machines plus the administrator machine that is running windows 7.

 

I get two main error messages that are the following,

 

WARNING: Unable to determine local IP address. Condor  might not work

propertly until you set  NETWORK_INTERFACE=<machine IP address>

 

and

 

In order for Condor to work properly you must set your CONDOR_CONFIG

environment variable to point to your Condor configuration file:

/home/sonia/condor-7.4.1/etc/condor_config before running Condor

commands/daemons.

 

 

How should I solve these problems? I have tried several alternatives but it’s still not working…

Any hints?

 

Another error message that I receive now and then is the following,

 

This is an automated email from the Condor system

on machine "o2f-sth-lap-016.un.dr.dgcsystems.net".  Do not reply.

 

"C:\condor/bin/condor_schedd.exe" on "o2f-sth-lap-016.un.dr.dgcsystems.net" exited with status 4.

Condor will automatically restart this process in 11 seconds.

 

*** Last 20 line(s) of file C:\condor/log/SchedLog:

06/18 09:10:17 (pid:5892) Using local config sources:

06/18 09:10:17 (pid:5892)    C:\condor/condor_config.local

06/18 09:10:17 (pid:5892) DaemonCore: Command Socket at <10.110.44.113:62060>

06/18 09:10:17 (pid:5892) History file rotation is enabled.

06/18 09:10:17 (pid:5892)   Maximum history file size is: 20971520 bytes

06/18 09:10:17 (pid:5892)   Number of rotated history files is: 2

06/18 09:10:17 (pid:5892) my_popen: CreateProcess failed

06/18 09:10:17 (pid:5892) Failed to execute C:\condor/bin/condor_shadow.std.exe, ignoring

06/18 09:10:17 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.  Will keep trying for 20 total seconds (20 to go).

06/18 09:10:37 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.

06/18 09:10:37 (pid:5892) ERROR: SECMAN:2003:TCP auth connection to <169.254.67.219:49157> failed.

06/18 09:10:37 (pid:5892) Failed to send alive to <169.254.67.219:49157>, will try again...

06/18 09:10:42 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.  Will keep trying for 20 total seconds (20 to go).

06/18 09:11:02 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.

06/18 09:11:02 (pid:5892) ERROR: SECMAN:2003:TCP auth connection to <169.254.67.219:49157> failed.

06/18 09:11:02 (pid:5892) Failed to send alive to <169.254.67.219:49157>, will try again...

06/18 09:11:07 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.  Will keep trying for 20 total seconds (20 to go).

06/18 09:11:27 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.

06/18 09:11:27 (pid:5892) ERROR: SECMAN:2003:TCP auth connection to <169.254.67.219:49157> failed.

06/18 09:11:27 (pid:5892) ERROR "FAILED TO SEND INITIAL KEEP ALIVE TO OUR PARENT <169.254.67.219:49157>" at line 9312 in file ..\src\condor_daemon_core.V6\daemon_core.cpp

*** End of file SchedLog

 

What does this mean?

 

 

Cheers,

Sónia

 

 

Sónia Liléo
O2 Strandvägen 5B 114 51 Stockholm
Tel: +46 8 559 310 37 Mobile: +46 73 752 95 74

www.o2.se