[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_schedd.exe exited (4)



Thanks for your answer Ben!

Is there an equivalent 'hosts' file in the Linux system? Where is it located?

Concerning the condor_config environment variable I have checked if the information on the location of the condor_config file is given in the condor.sh and the condor.csh files. And it seems correct.

Cheers,
Sónia
 

-----Ursprungligt meddelande-----
Från: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] För Burnett, Ben
Skickat: den 18 juni 2010 10:00
Till: Condor-Users Mail List
Ämne: Re: [Condor-users] VB: [Condor] Problem o2f-sth-lap-016.un.dr.dgcsystems.net: condor_schedd.exe exited (4)

Hi Sonia:

> I am trying to configure Condor on a linux cluster consisting of 13 machines plus the administrator machine that is running windows 7.
> 
> I get two main error messages that are the following,
> 
> WARNING: Unable to determine local IP address. Condor  might not work
> 
> propertly until you set  NETWORK_INTERFACE=<machine IP address>

Silly question: does the computer have and IP address?  If it does, then sometimes I find setting the hosts IP and hostname in the 'hosts' file solves this problem.  Try adding a line to:

C:\WINDOWS\system32\drivers\etc\hosts

Then restart Condor and see if it helped.


> In order for Condor to work properly you must set your CONDOR_CONFIG
> environment variable to point to your Condor configuration file:
> /home/sonia/condor-7.4.1/etc/condor_config before running Condor
> commands/daemons.
> How should I solve these problems? I have tried several alternatives but it's still not working...


Alternatives to which?  Can you be more specific as to what you have tried?  Is it set in your personal environment, or in the system's?  If you are running Condor as a service, it needs to be in the system's. otherwise Condor will not be able to find it.



> Another error message that I receive now and then is the following,
> 
> This is an automated email from the Condor system
> 
> on machine "o2f-sth-lap-016.un.dr.dgcsystems.net".  Do not reply.
> 
> "C:\condor/bin/condor_schedd.exe" on "o2f-sth-lap-016.un.dr.dgcsystems.net" exited with status 4.
> 
> Condor will automatically restart this process in 11 seconds.
> 
> 
> *** Last 20 line(s) of file C:\condor/log/SchedLog:
> 
> 06/18 09:10:17 (pid:5892) Using local config sources:
> 
> 06/18 09:10:17 (pid:5892)    C:\condor/condor_config.local
> 
> 06/18 09:10:17 (pid:5892) DaemonCore: Command Socket at <10.110.44.113:62060>
> 
> 06/18 09:10:17 (pid:5892) History file rotation is enabled.
> 
> 06/18 09:10:17 (pid:5892)   Maximum history file size is: 20971520 bytes
> 
> 06/18 09:10:17 (pid:5892)   Number of rotated history files is: 2
> 
> 06/18 09:10:17 (pid:5892) my_popen: CreateProcess failed
> 
> 06/18 09:10:17 (pid:5892) Failed to execute C:\condor/bin/condor_shadow.std.exe, ignoring
> 
> 06/18 09:10:17 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.  Will keep trying for 20 total seconds (20 to go).
> 
> 06/18 09:10:37 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.
> 
> 06/18 09:10:37 (pid:5892) ERROR: SECMAN:2003:TCP auth connection to <169.254.67.219:49157> failed.
> 
> 06/18 09:10:37 (pid:5892) Failed to send alive to <169.254.67.219:49157>, will try again...
> 
> 06/18 09:10:42 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.  Will keep trying for 20 total seconds (20 to go).
> 
> 06/18 09:11:02 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.
> 
> 06/18 09:11:02 (pid:5892) ERROR: SECMAN:2003:TCP auth connection to <169.254.67.219:49157> failed.
> 
> 06/18 09:11:02 (pid:5892) Failed to send alive to <169.254.67.219:49157>, will try again...
> 
> 06/18 09:11:07 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.  Will keep trying for 20 total seconds (20 to go).
> 
> 06/18 09:11:27 (pid:5892) attempt to connect to <169.254.67.219:49157> failed: connect errno = 10051.
> 
> 06/18 09:11:27 (pid:5892) ERROR: SECMAN:2003:TCP auth connection to <169.254.67.219:49157> failed.
> 
> 06/18 09:11:27 (pid:5892) ERROR "FAILED TO SEND INITIAL KEEP ALIVE TO OUR PARENT <169.254.67.219:49157>" at line 9312 in file ..\src\condor_daemon_core.V6\daemon_core.cpp
> 
> *** End of file SchedLog
> 
> 
> 
> What does this mean?


Socket Error 10051 means that the destination network is unreachable.  For one reason or another (probably the lack of IP), can you ping google.com from the command line?

Regards,
-B

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/