[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor_schedd died due to exception ACCESS_VIOLATION [Sec=Personal]



Title: Message
I am trying to set up Condor with the central manager running on a linux box with the rest of the pool on windows machines.  I seem to have the central manager up and running correctly but when I install 6.8.2 for Windows on submit/execute machines (eg 147.66.10.206) the condor_schedd process repeatedly dies and core dumps and is then restarted by the manager.
 
condor_status shows machines:
Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
 
vm1@ERM-43880 LINUX       INTEL  Owner      Idle       0.000   247  0+00:29:05
vm2@ERM-43880 LINUX       INTEL  Owner      Idle       0.000   247  0+00:29:06
SCI-46433.AAD WINNT51     INTEL  Unclaimed  Idle       0.080   767  0+00:49:57
 
                        Total Owner Claimed Unclaimed Matched Preempting Backfill
 
     INTEL/LINUX     2     2       0         0       0          0       0
INTEL/WINNT51     1     0       0         1       0          0        0
 
                Total     3     2       0         1       0          0        0
 
 
Have FULL_DEBUG on
MasterLog
---------------
2/2 15:44:54 ******************************************************
2/2 15:44:54 ** Condor (CONDOR_MASTER) STARTING UP
2/2 15:44:54 ** C:\condor\bin\condor_master.exe
2/2 15:44:54 ** $CondorVersion: 6.8.2 Oct 12 2006 $
2/2 15:44:54 ** $CondorPlatform: INTEL-WINNT50 $
2/2 15:44:54 ** PID = 440
2/2 15:44:54 ** Log last touched 2/2 15:43:45
2/2 15:44:54 ******************************************************
2/2 15:44:54 Using config source: C:\condor\condor_config
2/2 15:44:54 Using local config sources:
2/2 15:44:54    C:\condor/condor_config.local
2/2 15:44:54 DaemonCore: Command Socket at <147.66.10.206:1034>
2/2 15:45:14 Collector port not defined, will use default: 9618
2/2 15:45:15 Started DaemonCore process "C:\condor/bin/condor_collector.exe", pid and pgroup = 948
2/2 15:45:15 Started DaemonCore process "C:\condor/bin/condor_negotiator.exe", pid and pgroup = 972
2/2 15:45:15 Started DaemonCore process "C:\condor/bin/condor_schedd.exe", pid and pgroup = 1004
2/2 15:45:16 Started DaemonCore process "C:\condor/bin/condor_startd.exe", pid and pgroup = 1036
2/2 15:45:20 DaemonCore: Command received via UDP from host <147.66.10.206:1083>
2/2 15:45:20 DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())
2/2 15:45:20 The SCHEDD (pid 1004) died due to exception ACCESS_VIOLATION
2/2 15:45:20 Sending obituary for "C:\condor/bin/condor_schedd.exe"
2/2 15:45:23 restarting C:\condor/bin/condor_schedd.exe in 10 seconds
2/2 15:45:33 Started DaemonCore process "C:\condor/bin/condor_schedd.exe", pid and pgroup = 2108
2/2 15:45:34 DaemonCore: Command received via UDP from host <147.66.10.206:1099>
2/2 15:45:34 DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())
2/2 15:45:34 The SCHEDD (pid 2108) died due to exception ACCESS_VIOLATION
2/2 15:45:34 Sending obituary for "C:\condor/bin/condor_schedd.exe"
2/2 15:45:37 restarting C:\condor/bin/condor_schedd.exe in 11 seconds
2/2 15:45:48 Started DaemonCore process "C:\condor/bin/condor_schedd.exe", pid and pgroup = 2240
2/2 15:45:50 DaemonCore: Command received via UDP from host <147.66.10.206:1109>
2/2 15:45:50 DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())
2/2 15:45:50 The SCHEDD (pid 2240) died due to exception ACCESS_VIOLATION
 
....etc....etc
 
ScheddLog
----------------
2/2 16:04:25 (pid:3812) ******************************************************
2/2 16:04:25 (pid:3812) ** condor_schedd.exe (CONDOR_SCHEDD) STARTING UP
2/2 16:04:25 (pid:3812) ** C:\condor\bin\condor_schedd.exe
2/2 16:04:25 (pid:3812) ** $CondorVersion: 6.8.2 Oct 12 2006 $
2/2 16:04:25 (pid:3812) ** $CondorPlatform: INTEL-WINNT50 $
2/2 16:04:25 (pid:3812) ** PID = 3812
2/2 16:04:25 (pid:3812) ** Log last touched 2/2 15:55:44
2/2 16:04:25 (pid:3812) ******************************************************
2/2 16:04:25 (pid:3812) Using config source: C:\condor\condor_config
2/2 16:04:25 (pid:3812) Using local config sources:
2/2 16:04:25 (pid:3812)    C:\condor/condor_config.local
2/2 16:04:25 (pid:3812) DaemonCore: Command Socket at <147.66.10.206:1328>
2/2 16:04:25 (pid:3812) History file rotation is enabled.
2/2 16:04:25 (pid:3812)   Maximum history file size is: 20971520 bytes
2/2 16:04:25 (pid:3812)   Number of rotated history files is: 2
2/2 16:04:25 (pid:3812) my_popen: CreateProcess failed
2/2 16:04:25 (pid:3812) Failed to execute C:\condor/bin/condor_shadow.pvm.exe, ignoring
2/2 16:04:25 (pid:3812) my_popen: CreateProcess failed
2/2 16:04:25 (pid:3812) Failed to execute C:\condor/bin/condor_shadow.std.exe, ignoring
2/2 16:21:39 (pid:1004) ******************************************************
2/2 16:21:39 (pid:1004) ** condor_schedd.exe (CONDOR_SCHEDD) STARTING UP
2/2 16:21:39 (pid:1004) ** C:\condor\bin\condor_schedd.exe
2/2 16:21:39 (pid:1004) ** $CondorVersion: 6.8.2 Oct 12 2006 $
2/2 16:21:39 (pid:1004) ** $CondorPlatform: INTEL-WINNT50 $
2/2 16:21:39 (pid:1004) ** PID = 1004
2/2 16:21:39 (pid:1004) ** Log last touched 2/2 16:04:25
2/2 16:21:39 (pid:1004) ******************************************************
2/2 16:21:39 (pid:1004) Using config source: C:\condor\condor_config
2/2 16:21:39 (pid:1004) Using local config sources:
2/2 16:21:39 (pid:1004)    C:\condor/condor_config.local
2/2 16:21:39 (pid:1004) DaemonCore: Command Socket at <147.66.10.206:1405>
2/2 16:21:39 (pid:1004) History file rotation is enabled.
2/2 16:21:39 (pid:1004)   Maximum history file size is: 20971520 bytes
2/2 16:21:39 (pid:1004)   Number of rotated history files is: 2
2/2 16:21:39 (pid:1004) my_popen: CreateProcess failed
2/2 16:21:39 (pid:1004) Failed to execute C:\condor/bin/condor_shadow.pvm.exe, ignoring
2/2 16:21:39 (pid:1004) my_popen: CreateProcess failed
2/2 16:21:39 (pid:1004) Failed to execute C:\condor/bin/condor_shadow.std.exe, ignoring
 
Core Dump
----------------
//=====================================================
Exception code: C0000005 ACCESS_VIOLATION
Fault address:  004B00AD 01:000AF0AD C:\condor\bin\condor_schedd.exe
 
Registers:
EAX:33037A69
EBX:00D54A38
ECX:00D54A38
EDX:00BA0F10
ESI:0012F91C
EDI:00000400
CS:EIP:001B:004B00AD
SS:ESP:0023:0012F8C4  EBP:0012F900
DS:0023  ES:0023  FS:003B  GS:0000
Flags:00010206
 
Call stack:
Address   Frame
004B00AD  0012F8C8  ProcAPI::grabOffsets+1B
004AF400  0012F900  ProcAPI::getProcInfoRaw+FD
004AF21A  0012F984  ProcAPI::getProcInfo+1F
0048697E  0012F9A4  SelfMonitorData::CollectData+40
00495E43  0012F9E4  TimerManager::Timeout+154
0047DB9C  0012FE30  DaemonCore::Driver+212
00486127  0012FF68  dc_main+AF9
00486236  0012FF80  main+CE
 
 
Anyone have any ideas as I cannot find a solution and this is starting to wear me down?

___________________________________________________________________________

    Australian Government Antarctic Division - Commonwealth of Australia
IMPORTANT: This transmission is intended for the addressee only. If you are not the
intended recipient, you are notified that use or dissemination of this communication is
strictly prohibited by Commonwealth law. If you have received this transmission in error,
please notify the sender immediately by e-mail or by telephoning +61 3 6232 3209 and
DELETE the message.
        Visit our web site at http://www.aad.gov.au/
___________________________________________________________________________