[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] DAG condor_schedd crash on windows



>It looks like your job queue log is being corrupted. The stack trace  
>you posted is from when the schedd attempted to restart. Can you  
>email the stack trace from the initial crash?

Sure, hopefully its at the bottom of this mail.


>It looks like the commands above are being executed inside a script.  
>Can you email the exact code and the value of $dagjobid? The exact  
>parsing of the arguments is important in debugging a problem like this.

The value of the $clusterID variable is an integer. 
This code snippet was run from the script language of Maya:

system ("condor_qedit " + $clusterID + " LeaveJobInQueue FALSE");					
system ("condor_qedit -const \"DAGManJobId == \\\"" + $clusterID + "\\\" LeaveJobInQueue FALSE");
system ("condor_rm " + $clusterID);


The strange thing is that the command is executed without problems, the crash happens
afterwards.

Cheers,
Szabolcs







//=====================================================
Exception code: C0000005 ACCESS_VIOLATION
Fault address:  0040A14E 01:0000914E C:\Condor\bin\condor_schedd.exe

Registers:
EAX:00B7C3D4
EBX:00000000
ECX:0000119D
EDX:7C90EB94
ESI:0000119D
EDI:0000119D
CS:EIP:001B:0040A14E
SS:ESP:0023:0012FC08  EBP:0012FC0C
DS:0023  ES:0023  FS:003B  GS:0000
Flags:00010206

Call stack:
Address   Frame
0040A14E  0012FC0C  DestroyProc+1EB
0040A0D4  0012FD34  DestroyProc+171
004116BA  0012FD68  jobIsFinishedDone+3B
0041C4C4  0012FDA0  Scheduler::jobIsFinishedHandler+73
00478442  0012FDB8  SelfDrainingQueue::timerHandler+6C
00485C92  0012FDF4  TimerManager::Timeout+14D
0046FED3  0012FE30  DaemonCore::Driver+B5
00477FA6  0012FF68  dc_main+A44
004780B5  0012FF80  main+CE
0049B9BD  00000001  mainCRTStartup+C5