[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] DAGMan log file bug....



On Wed, 23 Aug 2006, Bob Mortensen wrote:

> I think I've found a bug with dagman managing the log files for a large
> DAG. Actually, it has to do with parsing the DAG and .sub files.
> Ultimately it causes the DAG to hang without ever completing. I'm running
> 6.8.0 on WindowsXP. Here are some details:
>
> I have a DAG with 82 nodes, no dependencies. In the .dag.dagman.out log
> file I can see that for a few of my nodes, the log file name is not being
> read correctly from the .sub file. A few of the pertinent lines from the
> .dag.dagman.out file are included below. Since dagman never gets the name
> correct, it is unable to read the file and therefore the usual ULOG events
> never show up for those nodes and it doesn't know that they complete. The
> nodes' log files are created and contain reasonable information. Finally,
> if I create a DAG of a subset of the nodes, the problem goes away (or at
> least moves).

We're looking into this.

Could you also send a tarfile of your .sub files?  I'd like to get a look
at what's different between the ones that work and the ones that don't.
Also, the complete dagman.out file would be good.

Kent Wenger
Condor Team