[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] DAG condor_schedd crash on windows



On Sep 22, 2005, at 6:47 AM, Horvatth Szabolcs wrote:

I constantly receive condor_schedd crash error emails when a dagman scheduler job
that had been set to stay in queue is removed from the queue. (On a windows computer.)


I use the following command to remove the whole DAG:
        {
// Set scheduler task "removeable"
condor_qedit $dagjobid LeaveJobInQueue FALSE")
// Set all tasks "removeable"
condor_qedit -const "DAGManJobId == $dagjobid" LeaveJobInQueue FALSE
condor_rm $dagjobid

The crash happens every time, but the jobs are removed nicely.

It looks like your job queue log is being corrupted. The stack trace you posted is from when the schedd attempted to restart. Can you email the stack trace from the initial crash?


It looks like the commands above are being executed inside a script. Can you email the exact code and the value of $dagjobid? The exact parsing of the arguments is important in debugging a problem like this.

+----------------------------------+---------------------------------+
|            Jaime Frey            |  Public Split on Whether        |
|        jfrey@xxxxxxxxxxx         |  Bush Is a Divider              |
|  http://www.cs.wisc.edu/~jfrey/  |         -- CNN Scrolling Banner |
+----------------------------------+---------------------------------+