[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] ntdll.dll access_violation on windows xp sp2 + dataexecution prevention



I don't know how DEP affects condor but we had issues with DEP in other
areas on both our 2003 servers and our XP workstations. You can change
DEP to be OptIn instead of OptOut. Easiest way to do this is to change
c:\boot.ini

Look for the bit that says "/noexecute=optout" and change it to
"/noexecute=optin", then reboot.

I automated this easily using a regex within a perl script, just make
sure that the regex is NOT case sensitive as I have seen the
optin/optout often occur in mixed case as well as all lowercase.

1) take a backup of the boot.ini
2) attrib -s -h -r $bootIniFile
3) regex boot .ini ->  s/noexecute=optout/noexecute=OptIn/i
4) attrib +s +h -a $bootIniFile
5) Reboot

Remember, always be careful when editing boot.ini.

Should work for XP/2003

Michael McClenahan

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Rob de Graaf
Sent: 11 May 2007 18:31
To: condor-users@xxxxxxxxxxx
Subject: [Condor-users] ntdll.dll access_violation on windows xp sp2 +
dataexecution prevention

Hello,

I've set up a test environment including a central manager running
Slackware linux and an execute only node running Windows XP service pack
2, both using condor 6.8.4. Both machines authenticate using kerberos
and I have successfully run some test jobs.

However, on the Windows execute node, the condor daemons don't always
start properly. Sometimes the condor_master will fail, sometimes the
condor_startd will fail, sometimes they both fail. When they go down
they leave a core file in the log directory. This one is from when the
condor_master failed to start:

    core.MASTER.win32
    //=====================================================
    Exception code: C0000005 ACCESS_VIOLATION
    Fault address:  7C93426D 01:0003326D C:\WINDOWS\system32\ntdll.dll

    Registers:
    EAX:FFFFFFFF
    EBX:00000362
    ECX:000320F0
    EDX:00000000
    ESI:000303A8
    EDI:000305D8
    CS:EIP:001B:7C93426D
    SS:ESP:0023:00CDF154  EBP:00CDF374
    DS:0023  ES:0023  FS:003B  GS:0000
    Flags:00010246

    Call stack:
    Address   Frame     Logical addr  Module
    7C93426D  00CDF374  0001:0003326D C:\WINDOWS\system32\ntdll.dll
    77C2C3C9  00CDF3B4  0001:0001B3C9 C:\WINDOWS\system32\msvcrt.dll
    77C2C3E7  00CDF3C0  0001:0001B3E7 C:\WINDOWS\system32\msvcrt.dll
    77C2C42E  00CDF3D0  0001:0001B42E C:\WINDOWS\system32\msvcrt.dll
    0034E510  00CDF3EC  0001:0000D510 C:\condor\bin\krb5_32.dll
    00342FD2  00CDF420  0001:00001FD2 C:\condor\bin\krb5_32.dll
    003824C7  00CDF638  0001:000414C7 C:\condor\bin\krb5_32.dll
    00464C76  00CDF684  0001:00063C76 C:\condor\bin\condor_master.exe
    0046440F  00CDF698  0001:0006340F C:\condor\bin\condor_master.exe
    0045FDCC  00CDF70C  0001:0005EDCC C:\condor\bin\condor_master.exe
    0045FB45  00CDF728  0001:0005EB45 C:\condor\bin\condor_master.exe
    00458951  00CDF758  0001:00057951 C:\condor\bin\condor_master.exe
    004589A7  00CDF9F8  0001:000579A7 C:\condor\bin\condor_master.exe
    00439747  00CDFE58  0001:00038747 C:\condor\bin\condor_master.exe

The core file the condor_startd creates on failure has the same entries.

The condor log files show nothing unusual.

After some digging I learned about the existence of something called
Data Execution Prevention (system properties -> advanced -> performance
settings -> data execution prevention) which was apparently added to
Windows XP with service pack 2. Adding the condor daemons to the
exception list appears to solve the problem, however I would prefer not
to have to do that.

As far as I can tell, the problem only occurs when condor is configured
to use kerberos authentication. The central manager is, for testing
purposes, also the KDC, using MIT kerberos version krb5-1.6.1. The
execute node runs a fully patched Windows XP SP2 and MIT kerberos for
windows version kfw-3.2.0. Both use condor-6.8.4.

Has anyone encountered this problem? What could be triggering windows' 
data execution prevention? How can I avoid adding the condor daemons to
the exception list?

Thanks in advance,

Rob de Graaf
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR

----
Gloucester Research Limited believes the information provided herein is reliable. While every care has been taken to ensure accuracy, the information is furnished to the recipients with no warranty as to the completeness and accuracy of its contents and on condition that any errors or omissions shall not be made the basis for any claim, demand or cause for action.
The information in this email is intended only for the named recipient.  If you are not the intended recipient please notify us immediately and do not copy, distribute or take action based on this e-mail.
All messages sent to and from this email address will be logged by Gloucester Research Ltd and are subject to archival storage, monitoring, review and disclosure.
Gloucester Research Limited, 5th Floor, Whittington House, 19-30 Alfred Place, London WC1E 7EA.
Gloucester Research Limited is a company registered in England and Wales with company number 04267560.
----