[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Strange Errors and Daemons not Starting



I'm having some strange issues trying to get Condor running on a group of Windows XP computers in a lab.  Even though the Condor service is set to automatic start, Condor is not starting when Windows is rebooted.  When I start the Condor service manually, it starts fine, but I notice some strange errors in the log and also strange listing in condor_status.
 
Here is the output of condor_status:
  C:\>condor_status
  Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
  vm1@xxxxxxxxx LINUX       X86_64 Unclaimed  Idle       0.000   987  0+00:00:01
  vm2@xxxxxxxxx LINUX       X86_64 Unclaimed  Idle       0.000   987  0+00:00:04
  vm3@xxxxxxxxx LINUX       X86_64 Unclaimed  Idle       0.000   987  0+22:42:32
  vm4@xxxxxxxxx LINUX       X86_64 Unclaimed  Idle       0.000   987  0+00:00:07
  014132CTIS115 WINNT51     INTEL  Owner      Idle       0.000   253  0+00:09:53
  vm1@05004CTIS WINNT51     INTEL  Owner      Idle       0.000   507  0+00:05:14
  vm2@05004CTIS WINNT51     INTEL  Owner      Idle       0.000   507  0+00:05:15
  vm1@05005CTIS WINNT51     INTEL  Owner      Idle       0.100   507[?????]
  vm2@05005CTIS WINNT51     INTEL  Owner      Idle       1.000   507[?????]
  vm1@pcclo332- WINNT51     INTEL  Unclaimed  Idle       0.020   511  0+00:40:04
Notice the strange [?????] for vm1@05005CTIS and vm2.  This is the machine I am having these issues with.  When I look at the logs, here is what I get:
 
MasterLog
7/12 13:23:54 ******************************************************
7/12 13:23:54 ** Condor (CONDOR_MASTER) STARTING UP
7/12 13:23:54 ** c:\Condor\bin\condor_master.exe
7/12 13:23:54 ** $CondorVersion: 6.7.20 Jun 21 2006 $
7/12 13:23:54 ** $CondorPlatform: INTEL-WINNT50 $
7/12 13:23:54 ** PID = 3212
7/12 13:23:54 ** Log last touched 7/10 13:52:17
7/12 13:23:54 ******************************************************
7/12 13:23:54 Using config source: C:\Condor\condor_config
7/12 13:23:54 Using local config sources:
7/12 13:23:54    \\condor.calumet.purdue.edu\condorconfig$\mainlab\baya\condor_config.net
7/12 13:23:54 DaemonCore: Command Socket at <205.215.115.172:23899>
7/12 13:23:54 Started DaemonCore process "C:\Condor\bin\condor_startd.exe", pid and pgroup = 3164
7/12 13:23:54 Started DaemonCore process "C:\Condor\bin\condor_schedd.exe", pid and pgroup = 3772
7/12 13:23:56 DaemonCore: Command received via UDP from host <205.215.115.172:23550>
7/12 13:23:56 DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())
7/12 13:23:56 The STARTD (pid 3164) exited with status 4
7/12 13:23:56 Sending obituary for "C:\Condor\bin\condor_startd.exe"
7/12 13:23:56 restarting C:\Condor\bin\condor_startd.exe in 10 seconds
7/12 13:24:04 DaemonCore: Command received via UDP from host <205.215.115.172:23648>
7/12 13:24:04 DaemonCore: received command 60000 (DC_RAISESIGNAL), calling handler (HandleSigCommand())
7/12 13:24:04 Got SIGTERM. Performing graceful shutdown.
7/12 13:24:04 Canceling timer to re-start STARTD
7/12 13:24:04 Sent signal 15 to SCHEDD (pid 3772)
7/12 13:24:05 DaemonCore: Command received via UDP from host <205.215.115.172:23630>
7/12 13:24:05 DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())
7/12 13:24:05 The SCHEDD (pid 3772) exited with status 0
7/12 13:24:05 All daemons are gone.  Exiting.
7/12 13:24:05 **** Condor (condor_MASTER) EXITING WITH STATUS 0

and

SchedLog
7/10 13:52:17 (pid:1572) **** condor_schedd.exe (condor_SCHEDD) EXITING WITH STATUS 0
7/12 13:23:55 (pid:3772) ******************************************************
7/12 13:23:55 (pid:3772) ** condor_schedd.exe (CONDOR_SCHEDD) STARTING UP
7/12 13:23:55 (pid:3772) ** C:\Condor\bin\condor_schedd.exe
7/12 13:23:55 (pid:3772) ** $CondorVersion: 6.7.20 Jun 21 2006 $
7/12 13:23:55 (pid:3772) ** $CondorPlatform: INTEL-WINNT50 $
7/12 13:23:55 (pid:3772) ** PID = 3772
7/12 13:23:55 (pid:3772) ** Log last touched 7/10 13:52:17
7/12 13:23:55 (pid:3772) ******************************************************
7/12 13:23:55 (pid:3772) Using config source: C:\Condor\condor_config
7/12 13:23:55 (pid:3772) Using local config sources:
7/12 13:23:55 (pid:3772)    \\condor.calumet.purdue.edu\condorconfig$\mainlab\baya\condor_config.net
7/12 13:23:55 (pid:3772) DaemonCore: Command Socket at <205.215.115.172:23445>
7/12 13:23:55 (pid:3772) History file rotation is enabled.
7/12 13:23:55 (pid:3772)   Maximum history file size is: 20971520 bytes
7/12 13:23:55 (pid:3772)   Number of rotated history files is: 2
7/12 13:23:55 (pid:3772) my_popen: CreateProcess failed
7/12 13:23:55 (pid:3772) Failed to execute C:\Condor\bin\condor_shadow.pvm, ignoring
7/12 13:23:55 (pid:3772) my_popen: CreateProcess failed
7/12 13:23:55 (pid:3772) Failed to execute C:\Condor\bin\condor_shadow.std, ignoring
7/12 13:24:04 (pid:3772) DaemonCore: Command received via UDP from host <205.215.115.172:23440>
7/12 13:24:04 (pid:3772) DaemonCore: received command 60000 (DC_RAISESIGNAL), calling handler (HandleSigCommand())
7/12 13:24:04 (pid:3772) Got SIGTERM. Performing graceful shutdown.
7/12 13:24:04 (pid:3772) Deleting CronMgr
7/12 13:24:04 (pid:3772) All shadows are gone, exiting.
7/12 13:24:04 (pid:3772) **** condor_schedd.exe (condor_SCHEDD) EXITING WITH STATUS 0

What do the experts here think could be the problem? ;)

 

 


 
 
John Alberts
Technical Assistant for EMS
alberts@xxxxxxxxxxxxxxxxxx
219-989-2083
CLO 332
http://public.xdi.org/=john.alberts