[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] DAG condor_schedd crash on windows

>> It looks like your job queue log is being corrupted. The stack trace
>>> you posted is from when the schedd attempted to restart. Can you
>>> email the stack trace from the initial crash?
>> The strange thing is that the command is executed without problems,  
>> the crash happens afterwards.
>Are you still having crashing problems? I took a look at the stack  
>trace and didn't see anything obviously wrong.

Yes. With 6.7.12 I seem to have less crashes but it still happens maybe 1 out of 10 times.

And I found another strange thing: sometimes when DAGMan jobs are removed from the queue
some of the jobs (already submitted by dagman) stay in the queue and continue running, instead
of terminating with the DAGMan job. And if the crash happens before the job is finished (happens once in a while)
the rest of the dag is not submitted at all.