[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] startd terminate and restart




Had a WinXP SP2 machine on 6.6.11 acting as submit/execute node where the MasterLog was filling up with this:

7/12 16:47:12 ******************************************************
7/12 16:47:12 ** Condor (CONDOR_MASTER) STARTING UP
7/12 16:47:12 ** C:\Condor\bin\condor_master.exe
7/12 16:47:12 ** $CondorVersion: 6.6.11 Mar 23 2006 $
7/12 16:47:12 ** $CondorPlatform: INTEL-WINNT50 $
7/12 16:47:12 ** PID = 4040
7/12 16:47:12 ******************************************************
7/12 16:47:12 Using config file: C:\Condor\condor_config
7/12 16:47:12 Using local config files: C:\Condor\condor_config.local
7/12 16:47:12 DaemonCore: Command Socket at <x.x.x.x:2860>
7/12 16:47:12 Started DaemonCore process "C:\Condor\bin\condor_startd.exe", pid and pgroup = 1272
7/12 16:47:12 Started DaemonCore process "C:\Condor\bin\condor_schedd.exe", pid and pgroup = 2760
7/12 16:47:12 DaemonCore: Command received via UDP from host <x.x.x.x:2863>
7/12 16:47:12 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand())
7/12 16:47:12 The STARTD (pid 1272) exited with status -1073741521
7/12 16:47:12 Sending obituary for "C:\Condor\bin\condor_startd.exe"
7/12 16:47:13 restarting C:\Condor\bin\condor_startd.exe in 10 seconds
7/12 16:47:23 Started DaemonCore process "C:\Condor\bin\condor_startd.exe", pid and pgroup = 280
7/12 16:47:23 DaemonCore: Command received via UDP from host <x.x.x.x:2874>
7/12 16:47:23 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand())
7/12 16:47:23 The STARTD (pid 280) exited with status -1073741521
7/12 16:47:23 Sending obituary for "C:\Condor\bin\condor_startd.exe"
7/12 16:47:24 restarting C:\Condor\bin\condor_startd.exe in 11 seconds
7/12 16:47:35 Started DaemonCore process "C:\Condor\bin\condor_startd.exe", pid and pgroup = 3428
7/12 16:47:35 DaemonCore: Command received via UDP from host <x.x.x.x:2881>
7/12 16:47:35 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand())
7/12 16:47:35 The STARTD (pid 3428) exited with status -1073741521
7/12 16:47:35 Sending obituary for "C:\Condor\bin\condor_startd.exe"
7/12 16:47:35 restarting C:\Condor\bin\condor_startd.exe in 13 seconds
7/12 16:47:48 Started DaemonCore process "C:\Condor\bin\condor_startd.exe", pid and pgroup = 3752
7/12 16:47:48 DaemonCore: Command received via UDP from host <x.x.x.x:2886>
7/12 16:47:48 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand())
7/12 16:47:48 The STARTD (pid 3752) exited with status -1073741521
7/12 16:47:48 Sending obituary for "C:\Condor\bin\condor_startd.exe"
7/12 16:47:48 restarting C:\Condor\bin\condor_startd.exe in 17 seconds
7/12 16:48:05 Started DaemonCore process "C:\Condor\bin\condor_startd.exe", pid and pgroup = 3388
7/12 16:48:05 DaemonCore: Command received via UDP from host <x.x.x.x:2891>
7/12 16:48:05 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand())
7/12 16:48:05 The STARTD (pid 3388) exited with status -1073741521
7/12 16:48:05 restarting C:\Condor\bin\condor_startd.exe in 25 seconds


and so on, and so on...

I eventually narrowed the problem down to a presumably corrupt copy of PDH.DLL as evidenced only by the lack of a Version tab in the file's properties.  Replacing it seems to have corrected the problem but so does just deleting it.

My question then is can PDH.DLL be safely deleted?  It's a considerably out-of-date NT4 version 4.0.1314.1 compared to what ships with XP version 5.1.2600.2180.  Will Condor utilize the registered XP version in the absence of PDH.DLL in the BIN folder without any subtle issues?



Mike