[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] DAGMan log file bug....



Thanks Kent,

I will send under seperate cover, directly to you.

Bob


On Thu, 24 Aug 2006 07:53:40 -0700, R. Kent Wenger <wenger@xxxxxxxxxxx> wrote:

On Wed, 23 Aug 2006, Bob Mortensen wrote:

I think I've found a bug with dagman managing the log files for a large
DAG. Actually, it has to do with parsing the DAG and .sub files.
Ultimately it causes the DAG to hang without ever completing. I'm running
6.8.0 on WindowsXP. Here are some details:

I have a DAG with 82 nodes, no dependencies. In the .dag.dagman.out log
file I can see that for a few of my nodes, the log file name is not being
read correctly from the .sub file. A few of the pertinent lines from the
.dag.dagman.out file are included below. Since dagman never gets the name correct, it is unable to read the file and therefore the usual ULOG events never show up for those nodes and it doesn't know that they complete. The nodes' log files are created and contain reasonable information. Finally,
if I create a DAG of a subset of the nodes, the problem goes away (or at
least moves).

We're looking into this.

Could you also send a tarfile of your .sub files?  I'd like to get a look
at what's different between the ones that work and the ones that don't.
Also, the complete dagman.out file would be good.

Kent Wenger
Condor Team
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR