[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] dagman aborts without creating a rescue dag



R. Kent Wenger wrote:
On Fri, 5 Aug 2005, Alexander Dietz wrote:

  
I was running a DAG on my submitting machine (Red Hat Enterprise Linux
AS release 3, condor version 6.7.8) whereas all the jobs shall be
executed on a remote machine (Fedora Core release 3 (Heidelberg), condor
version 6.7.8). Almost the full DAG completed, but then the dagman
aborts. Here are the last few lines from the dagman.out-file:

...

8/4 22:04:37 ERROR "Assertion ERROR on (job->GetStatus() ==
Job::STATUS_POSTRUN || recovery)" at line 772 in file dag.C

The user proxies on botch machines were still valid for a long time, and
then the dagman aborts without creating a rescue dag. Is there possibly
a bug in the file dag.C or whats going on?
    

Yes, you hit a known bug in DAGMan.  The fix is in 6.7.10, which should be
coming out within a few days.
  

Hi,

I updated to 6.7.10 but I got the same error again (with a slightly different line-number)!
So what now?

Regards
Alexander Dietz