[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] VB: [Condor] Problem o2f-sth-lap-016.un.dr.dgcsystems.net: condor_schedd.exe exited (4)



Hi Sonia:

> I am trying to configure Condor on a linux cluster consisting of 13 machines plus the administrator machine that is running windows 7.
> 
> I get two main error messages that are the following,
> 
> WARNING: Unable to determine local IP address. Condor  might not work
> 
> propertly until you set  NETWORK_INTERFACE=<machine IP address>

Silly question: does the computer have and IP address?  If it does, then sometimes I find setting the hosts IP and hostname in the 'hosts' file solves this problem.  Try adding a line to:

C:\WINDOWS\system32\drivers\etc\hosts

Then restart Condor and see if it helped.


> In order for Condor to work properly you must set your CONDOR_CONFIG
> environment variable to point to your Condor configuration file:
> /home/sonia/condor-7.4.1/etc/condor_config before running Condor
> commands/daemons.
> How should I solve these problems? I have tried several alternatives but it's still not working...


Alternatives to which?  Can you be more specific as to what you have tried?  Is it set in your personal environment, or in the system's?  If you are running Condor as a service, it needs to be in the system's. otherwise Condor will not be able to find it.


> Another error message that I receive now and then is the following,
> 
> This is an automated email from the Condor system
> 
> on machine "o2f-sth-lap-016.un.dr.dgcsystems.net".  Do not reply.
> 
> "C:\condor/bin/condor_schedd.exe" on "o2f-sth-lap-016.un.dr.dgcsystems.net" exited with status 4.
> 
> Condor will automatically restart this process in 11 seconds.
> 
> 
> *** Last 20 line(s) of file C:\condor/log/SchedLog:
> 
> 06/18 09:10:17 (pid:5892) Using local config sources:
> 
> 06/18 09:10:17 (pid:5892)    C:\condor/condor_config.local
> 
> 06/18 09:10:17 (pid:5892) DaemonCore: Command Socket at <10.110.44.113:62060>
> 
> 06/18 09:10:17 (pid:5892) History file rotation is enabled.
> 
> 06/18 09:10:17 (pid:5892)   Maximum history file size is: 20971520 bytes
> 
> 06/18 09:10:17 (pid:5892)   Number of rotated history files is: 2
> 
> 06/18 09:10:17 (pid:5892) my_popen: CreateProcess failed
> 
> 06/18 09:10:17 (pid:5892) Failed to execute C:\condor/bin/condor_shadow.std.exe, ignoring
> 
> 06/18 09:10:17 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.  Will keep trying for 20 total seconds (20 to go).
> 
> 06/18 09:10:37 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.
> 
> 06/18 09:10:37 (pid:5892) ERROR: SECMAN:2003:TCP auth connection to <169.254.67.219:49157> failed.
> 
> 06/18 09:10:37 (pid:5892) Failed to send alive to <169.254.67.219:49157>, will try again...
> 
> 06/18 09:10:42 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.  Will keep trying for 20 total seconds (20 to go).
> 
> 06/18 09:11:02 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.
> 
> 06/18 09:11:02 (pid:5892) ERROR: SECMAN:2003:TCP auth connection to <169.254.67.219:49157> failed.
> 
> 06/18 09:11:02 (pid:5892) Failed to send alive to <169.254.67.219:49157>, will try again...
> 
> 06/18 09:11:07 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.  Will keep trying for 20 total seconds (20 to go).
> 
> 06/18 09:11:27 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.
> 
> 06/18 09:11:27 (pid:5892) ERROR: SECMAN:2003:TCP auth connection to <169.254.67.219:49157> failed.
> 
> 06/18 09:11:27 (pid:5892) ERROR "FAILED TO SEND INITIAL KEEP ALIVE TO OUR PARENT <169.254.67.219:49157>" at line 9312 in file ..\src\condor_daemon_core.V6\daemon_core.cpp
> 
> *** End of file SchedLog
> 
> 
> 
> What does this mean?


Socket Error 10051 means that the destination network is unreachable.  For one reason or another (probably the lack of IP), can you ping google.com from the command line?

Regards,
-B