R. Kent Wenger wrote:|
On Fri, 5 Aug 2005, Alexander Dietz wrote:I was running a DAG on my submitting machine (Red Hat Enterprise Linux AS release 3, condor version 6.7.8) whereas all the jobs shall be executed on a remote machine (Fedora Core release 3 (Heidelberg), condor version 6.7.8). Almost the full DAG completed, but then the dagman aborts. Here are the last few lines from the dagman.out-file: ... 8/4 22:04:37 ERROR "Assertion ERROR on (job->GetStatus() == Job::STATUS_POSTRUN || recovery)" at line 772 in file dag.C The user proxies on botch machines were still valid for a long time, and then the dagman aborts without creating a rescue dag. Is there possibly a bug in the file dag.C or whats going on?Yes, you hit a known bug in DAGMan. The fix is in 6.7.10, which should be coming out within a few days.
I updated to 6.7.10 but I got the same error again (with a slightly different line-number)!
So what now?