[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] DAG condor_schedd crash on windows



I constantly receive condor_schedd crash error emails when a dagman scheduler job
that had been set to stay in queue is removed from the queue. (On a windows computer.)

I use the following command to remove the whole DAG:
		{
// Set scheduler task "removeable"
condor_qedit $dagjobid LeaveJobInQueue FALSE")
// Set all tasks "removeable"
condor_qedit -const "DAGManJobId == $dagjobid" LeaveJobInQueue FALSE
condor_rm $dagjobid

The crash happens every time, but the jobs are removed nicely.


Cheers,
Szabolcs

---

Just an example:


This is an automated email from the Condor system
on machine "snoopy.digicpictures.local".  Do not reply.

"C:\Condor/bin/condor_schedd.exe" on "snoopy.digicpictures.local" died due to exception ACCESS_VIOLATION.

Condor will automatically restart this process in 17 seconds.

*** Last 20 line(s) of file SchedLog:
8/15 10:08:40 ** $CondorPlatform: INTEL-WINNT50 $
8/15 10:08:40 ** PID = 3564
8/15 10:08:40 ******************************************************
8/15 10:08:40 Using config file: C:\Condor\condor_config
8/15 10:08:40 Using local config files: C:\Condor/condor_config.local
8/15 10:08:40 DaemonCore: Command Socket at <192.168.0.71:1122>
8/15 10:08:41 "C:\Condor/bin/condor_shadow.pvm -classad" did not produce any output, ignoring
8/15 10:08:41 "C:\Condor/bin/condor_shadow.std -classad" did not produce any output, ignoring
8/15 10:09:09 ******************************************************
8/15 10:09:09 ** condor_schedd.exe (CONDOR_SCHEDD) STARTING UP
8/15 10:09:09 ** C:\Condor\bin\condor_schedd.exe
8/15 10:09:09 ** $CondorVersion: 6.7.9 Jul 14 2005 $
8/15 10:09:09 ** $CondorPlatform: INTEL-WINNT50 $
8/15 10:09:09 ** PID = 3264
8/15 10:09:09 ******************************************************
8/15 10:09:09 Using config file: C:\Condor\condor_config
8/15 10:09:09 Using local config files: C:\Condor/condor_config.local
8/15 10:09:09 DaemonCore: Command Socket at <192.168.0.71:1150>
8/15 10:09:09 "C:\Condor/bin/condor_shadow.pvm -classad" did not produce any output, ignoring
8/15 10:09:09 "C:\Condor/bin/condor_shadow.std -classad" did not produce any output, ignoring
*** End of file SchedLog

*** Last entry in core file core.SCHEDD.WIN32

================================
Exception code: C0000005 ACCESS_VIOLATION
Fault address:  0049B018 01:0009A018 C:\Condor\bin\condor_schedd.exe

Registers:
EAX:000000FF
EBX:00000000
ECX:018AF5D0
EDX:0052F880
ESI:29300030
EDI:00971410
CS:EIP:001B:0049B018
SS:ESP:0023:0012F420  EBP:0012F430
DS:0023  ES:0023  FS:003B  GS:0000
Flags:00010286

Call stack:
Address   Frame
0049B018  0012F430  stricmp+88
0046242F  0012F444  AttrList::Lookup+1F
0046240D  0012F44C  AttrList::Lookup+9
004624F2  0012F454  AttrList::Lookup+C
0046216B  0012F474  AttrList::Insert+34
0046211C  0012F48C  AttrList::Insert+2E
004496F6  0012F4BC  LogSetAttribute::Play+8F
00448861  0012F4E4  ClassAdLog::ClassAdLog+CD
00446D42  0012F52C  ClassAdCollection::ClassAdCollection+22
00409343  0012FB90  InitJobQueue+60
0041F321  0012FE28  main_init+131
004778B6  0012FF5C  dc_main+A26
004B6C26  00000001  EnumProcessModules+3D02

*** End of file core.SCHEDD.WIN32