[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] ntdll.dll access_violation on windows xp sp2 + data execution prevention


I've set up a test environment including a central manager running Slackware linux and an execute only node running Windows XP service pack 2, both using condor 6.8.4. Both machines authenticate using kerberos and I have successfully run some test jobs.

However, on the Windows execute node, the condor daemons don't always start properly. Sometimes the condor_master will fail, sometimes the condor_startd will fail, sometimes they both fail. When they go down they leave a core file in the log directory. This one is from when the condor_master failed to start:

   Exception code: C0000005 ACCESS_VIOLATION
   Fault address:  7C93426D 01:0003326D C:\WINDOWS\system32\ntdll.dll

   SS:ESP:0023:00CDF154  EBP:00CDF374
   DS:0023  ES:0023  FS:003B  GS:0000

   Call stack:
   Address   Frame     Logical addr  Module
   7C93426D  00CDF374  0001:0003326D C:\WINDOWS\system32\ntdll.dll
   77C2C3C9  00CDF3B4  0001:0001B3C9 C:\WINDOWS\system32\msvcrt.dll
   77C2C3E7  00CDF3C0  0001:0001B3E7 C:\WINDOWS\system32\msvcrt.dll
   77C2C42E  00CDF3D0  0001:0001B42E C:\WINDOWS\system32\msvcrt.dll
   0034E510  00CDF3EC  0001:0000D510 C:\condor\bin\krb5_32.dll
   00342FD2  00CDF420  0001:00001FD2 C:\condor\bin\krb5_32.dll
   003824C7  00CDF638  0001:000414C7 C:\condor\bin\krb5_32.dll
   00464C76  00CDF684  0001:00063C76 C:\condor\bin\condor_master.exe
   0046440F  00CDF698  0001:0006340F C:\condor\bin\condor_master.exe
   0045FDCC  00CDF70C  0001:0005EDCC C:\condor\bin\condor_master.exe
   0045FB45  00CDF728  0001:0005EB45 C:\condor\bin\condor_master.exe
   00458951  00CDF758  0001:00057951 C:\condor\bin\condor_master.exe
   004589A7  00CDF9F8  0001:000579A7 C:\condor\bin\condor_master.exe
   00439747  00CDFE58  0001:00038747 C:\condor\bin\condor_master.exe

The core file the condor_startd creates on failure has the same entries. The condor log files show nothing unusual.

After some digging I learned about the existence of something called Data Execution Prevention (system properties -> advanced -> performance settings -> data execution prevention) which was apparently added to Windows XP with service pack 2. Adding the condor daemons to the exception list appears to solve the problem, however I would prefer not to have to do that.

As far as I can tell, the problem only occurs when condor is configured to use kerberos authentication. The central manager is, for testing purposes, also the KDC, using MIT kerberos version krb5-1.6.1. The execute node runs a fully patched Windows XP SP2 and MIT kerberos for windows version kfw-3.2.0. Both use condor-6.8.4.

Has anyone encountered this problem? What could be triggering windows' data execution prevention? How can I avoid adding the condor daemons to the exception list?

Thanks in advance,

Rob de Graaf