[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] ntdll.dll access_violation on windows xp sp2 + dataexecution prevention
- Date: Mon, 14 May 2007 0:25:00 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [Condor-users] ntdll.dll access_violation on windows xp sp2 + dataexecution prevention
Looking at the stack trace you gave below, it certainly looks like the kerberos library that is linked into the Condor daemons is to blame.
On Windows, Condor links with unmodified MIT kerb5 version 1.4.3 libraries. Perhaps someone at MIT could comment on why kerb code is triggering DEP faults (we could point them at this thread). Or perhaps Condor should link with a more recent version of Kerberos in the hopes that this bug has already been squashed.
University of Wisconsin-Madison
<-- Sent from a Palm Treo 680 phone -->
From: Rob de Graaf <rob@xxxxxxxxxxxxxxxxxx>
Subj: [Condor-users] ntdll.dll access_violation on windows xp sp2 + dataexecution prevention
Date: Fri May 11, 2007 12:30 pm
I've set up a test environment including a central manager running
Slackware linux and an execute only node running Windows XP service pack
2, both using condor 6.8.4. Both machines authenticate using kerberos
and I have successfully run some test jobs.
However, on the Windows execute node, the condor daemons don't always
start properly. Sometimes the condor_master will fail, sometimes the
condor_startd will fail, sometimes they both fail. When they go down
they leave a core file in the log directory. This one is from when the
condor_master failed to start:
Exception code: C0000005 ACCESS_VIOLATION
Fault address: 7C93426D 01:0003326D C:\WINDOWS\system32\ntdll.dll
DS:0023 ES:0023 FS:003B GS:0000
Address Frame Logical addr Module
7C93426D 00CDF374 0001:0003326D C:\WINDOWS\system32\ntdll.dll
77C2C3C9 00CDF3B4 0001:0001B3C9 C:\WINDOWS\system32\msvcrt.dll
77C2C3E7 00CDF3C0 0001:0001B3E7 C:\WINDOWS\system32\msvcrt.dll
77C2C42E 00CDF3D0 0001:0001B42E C:\WINDOWS\system32\msvcrt.dll
0034E510 00CDF3EC 0001:0000D510 C:\condor\bin\krb5_32.dll
00342FD2 00CDF420 0001:00001FD2 C:\condor\bin\krb5_32.dll
003824C7 00CDF638 0001:000414C7 C:\condor\bin\krb5_32.dll
00464C76 00CDF684 0001:00063C76 C:\condor\bin\condor_master.exe
0046440F 00CDF698 0001:0006340F C:\condor\bin\condor_master.exe
0045FDCC 00CDF70C 0001:0005EDCC C:\condor\bin\condor_master.exe
0045FB45 00CDF728 0001:0005EB45 C:\condor\bin\condor_master.exe
00458951 00CDF758 0001:00057951 C:\condor\bin\condor_master.exe
004589A7 00CDF9F8 0001:000579A7 C:\condor\bin\condor_master.exe
00439747 00CDFE58 0001:00038747 C:\condor\bin\condor_master.exe
The core file the condor_startd creates on failure has the same entries.
The condor log files show nothing unusual.
After some digging I learned about the existence of something called
Data Execution Prevention (system properties -> advanced -> performance
settings -> data execution prevention) which was apparently added to
Windows XP with service pack 2. Adding the condor daemons to the
exception list appears to solve the problem, however I would prefer not
to have to do that.
As far as I can tell, the problem only occurs when condor is configured
to use kerberos authentication. The central manager is, for testing
purposes, also the KDC, using MIT kerberos version krb5-1.6.1. The
execute node runs a fully patched Windows XP SP2 and MIT kerberos for
windows version kfw-3.2.0. Both use condor-6.8.4.
Has anyone encountered this problem? What could be triggering windows'
data execution prevention? How can I avoid adding the condor daemons to
--- message truncated ---