[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Having a problem with 7.4.1 on windows - solved?



Answering my own question (would any condor developers care to comment?).
 
After much troubleshooting it became obvious that the problem had something to do with the startd,
i.e. a config file with only the master, or only master and schedd, would work fine but anytime startd
was included in the daemon list then the same exception error would occur. I had noticed that a keyboard
daemon kbdd was included in the msi generated config file so added that as well (without startd) and
condor started and did not crash. The kbddLog file had a message that it was aborting because it
couldn't detect the startd running, ah ha. So I included startd now and all seems OK.
 
It seems as though condor_kbdd.exe needs to be started in the daemon list as a "helper" for startd
now in the windows 7.4.* series, whereas it didn't exist in the 7.2.* series (although there was a
condor_kbdd_dll.dll file). I've had a quick look but can't seem to find any reference to this new
"requirement" on the condor web site in the release notes (or anywhere else). It also seems a bit
extreme that not having it in the daemon list along with the startd causes condor to crash with an exception
error rather than log a message to the log file and abort.
 
Hopefully this may helps others if they encounter this problem, assuming I've got it right! :) Condor Team?
 
Cheers
 
Greg

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Greg.Hitchen@xxxxxxxx
Sent: Wednesday, 13 January 2010 1:16 PM
To: condor-users@xxxxxxxxxxx
Subject: [ExternalEmail] Re: [Condor-users] Having a problem with 7.4.1 on windows

Bit more info.
 
Installing 7.4.1 manually on the local PC with the MSI file works OK.
i.e. Condor up and running and joined pool fine.
Stop condor, replace with our working 7.2.4 config file, start condor and back
to behaviour described previously. For reference here is the MasterLog when
using the MSI generated config file that works OK.
 
01/13 12:25:08 UnsetEnv(NET_REMAP_ENABLE): SetEnvironmentVariable failed, errno=203
01/13 12:25:08 Locale: English_United States.1252
01/13 12:25:08 ******************************************************
01/13 12:25:08 ** Condor (CONDOR_MASTER) STARTING UP
01/13 12:25:08 ** C:\Program Files\condor\bin\condor_master.exe
01/13 12:25:08 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
01/13 12:25:08 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
01/13 12:25:08 ** $CondorVersion: 7.4.1 Dec 17 2009 BuildID: 204351 $
01/13 12:25:08 ** $CondorPlatform: INTEL-WINNT50 $
01/13 12:25:08 ** PID = 3676
01/13 12:25:08 ** Log last touched time unavailable (No such file or directory)
01/13 12:25:08 ******************************************************
01/13 12:25:08 Using config source: C:\Program Files\condor\condor_config
01/13 12:25:08 Using local config sources:
01/13 12:25:08    C:\PROGRA~1\condor/condor_config.local
01/13 12:25:08 DaemonCore: Command Socket at <130.116.144.59:1199>
01/13 12:25:08 Authorized application C:\PROGRA~1\condor/bin/condor_schedd.exe is now enabled in the firewall.
01/13 12:25:08 Authorized application C:\PROGRA~1\condor/bin/condor_shadow.exe is now enabled in the firewall.
01/13 12:25:08 Authorized application C:\PROGRA~1\condor/bin/condor_gridmanager.exe is now enabled in the firewall.
01/13 12:25:08 Authorized application C:\PROGRA~1\condor/bin/condor_c-gahp.exe is now enabled in the firewall.
01/13 12:25:08 Authorized application C:\PROGRA~1\condor/bin/condor_c-gahp_worker_thread.exe is now enabled in the firewall.
01/13 12:25:08 Authorized application C:\PROGRA~1\condor/bin/condor_startd.exe is now enabled in the firewall.
01/13 12:25:08 Authorized application C:\PROGRA~1\condor/bin/condor_kbdd.exe is now enabled in the firewall.
01/13 12:25:08 Authorized application C:\PROGRA~1\condor/bin/condor_starter.exe is now enabled in the firewall.
01/13 12:25:08 Authorized application C:\PROGRA~1\condor/bin/condor_vm-gahp.exe is now enabled in the firewall.
01/13 12:25:08 Authorized application C:\PROGRA~1\condor/bin\condor_dagman.exe is now enabled in the firewall.
01/13 12:25:08 Started DaemonCore process "C:\PROGRA~1\condor/bin/condor_schedd.exe", pid and pgroup = 4016
01/13 12:25:09 Started DaemonCore process "C:\PROGRA~1\condor/bin/condor_startd.exe", pid and pgroup = 3808
01/13 12:25:09 Started DaemonCore process "C:\PROGRA~1\condor/bin/condor_kbdd.exe", pid and pgroup = 2828
How can a different config file (that has worked for 7.2.4 and many previous versions)
cause access violations, exceptions and core file dumps?
 
 
Cheers
 
Greg

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Greg.Hitchen@xxxxxxxx
Sent: Tuesday, 12 January 2010 4:38 PM
To: condor-users@xxxxxxxxxxx
Subject: [ExternalEmail] [Condor-users] Having a problem with 7.4.1 on windows

Hi All
 
Just testing upgrading from 7.2.4 to 7.4.1 on a few Windows machines
before applying to our pool/s. We have just downloaded the zip file,
unzipped it and copied to the PCs, along with the appropriate config files
(and after net stop condor).
 
(Our normal distribution is via a file server with the latest version and a
scheduled task on the PCs that checks every day and downloads if there
is a different version on the server to that on the local PC).
 
For testing we do this manually, we then net start condor and.......
 
In each case we get the following MasterLog file and CORE.Master.Win32 file
 
Any suggestions/ideas?
 
Thanks
 
Cheers
 
Greg
 
01/12 14:57:05 UnsetEnv(NET_REMAP_ENABLE): SetEnvironmentVariable failed, errno=203
01/12 14:57:05 Locale: English_United States.1252
01/12 14:57:05 ******************************************************
01/12 14:57:05 ** Condor (CONDOR_MASTER) STARTING UP
01/12 14:57:05 ** c:\PROGRA~1\condor\bin\condor_master.exe
01/12 14:57:05 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
01/12 14:57:05 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
01/12 14:57:05 ** $CondorVersion: 7.4.1 Dec 17 2009 BuildID: 204351 $
01/12 14:57:05 ** $CondorPlatform: INTEL-WINNT50 $
01/12 14:57:05 ** PID = 7880
01/12 14:57:05 ** Log last touched time unavailable (No such file or directory)
01/12 14:57:05 ******************************************************
01/12 14:57:05 Using config source: c:\PROGRA~1\condor\condor_config
01/12 14:57:05 Using local config sources:
01/12 14:57:05    C:\PROGRA~1\condor/condor_config.local
01/12 14:57:05 DaemonCore: Command Socket at <130.116.146.130:9391>
01/12 14:57:05 Authorized application C:\PROGRA~1\condor/bin/condor_startd.exe is now enabled in the firewall.
01/12 14:57:05 Intercepting an unhandled exception.
01/12 14:57:05 Dropping a core file.

//=====================================================

PID: 7880

Exception code: C0000005 ACCESS_VIOLATION

Fault address: 00493720 01:00092720 c:\PROGRA~1\condor\bin\condor_master.exe

Registers:

EAX:00000001

EBX:00D60700

ECX:00000000

EDX:7C90E514

ESI:00C9FE98

EDI:00000400

CS:EIP:001B:00493720

SS:ESP:0023:00C9FDF0 EBP:00C9FE1C

DS:0023 ES:0023 FS:003B GS:0000

Flags:00010246

Call stack:

Address Frame

00493720 00C9FDEC strlen (f:\dd\vctools\crt_bld\SELF_X86\crt\src\INTEL\strlen.asm:81)

00465FE5 00C9FE1C WindowsFirewallHelper::charToBstr (c:\condor\execute\dir_5488\userdir\src\condor_c++_util\firewall.windows.cpp:403)

00466467 00C9FE38 WindowsFirewallHelper::addTrusted (c:\condor\execute\dir_5488\userdir\src\condor_c++_util\firewall.windows.cpp:135)

0040368C 00564940 init_firewall_exceptions (c:\condor\execute\dir_5488\userdir\src\condor_master.v6\master.cpp:1420)

//=====================================================