[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] DAGman duplicating jobs on schedd restart

The DAGs do indeed have their own log files; each DAG logs to it's own directory.

I've attached all of the logs you asked for...I've just started using this mailing list so I'm hoping it allows them.


On 3 November 2011 15:19, R. Kent Wenger <wenger@xxxxxxxxxxx> wrote:
On Thu, 3 Nov 2011, Christopher Martin wrote:

So from what I can see it's like you say, it's as if the dagman can't tell
that the jobs have completed successfully. The job logs do indicate
completion though. I'm wondering, do the jobs all have to log to the same
log file? Currently I have each job logging to it's own log file. All logs
for both the jobs and the dagman are logging to the same directory.
I've included snippets from a dagman.out that shows the state of things
before and after the schedd restart.

It's fine to have any combination of jobs logging to their own log files vs. jobs logging to a common log file.  It's important, though, that jobs in separate DAGs not share log files (unless you're 100% sure the DAGs won't be run at the same time).

Can you send the following files?:
* dagman.out
* the actual dag file
* the node job log files

If you do that, I'll take a look in more detail and see what I can figure out.

>From your original email, it sounds like this problem happens consistently
when your schedd restarts -- is that right?  If so, that eliminates the things that would be my first guesses as to the problem (e.g., some kind of transient log file reading error).

Kent Wenger
Condor Team

Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxedu with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at:

Attachment: dagmanlogs.tar.gz
Description: GNU Zip compressed data