[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Dagman exits, restart and hangs....
- Date: Wed, 31 Jan 2007 12:17:54 -0800
- From: Robert Mortensen <bobm@xxxxxxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Dagman exits, restart and hangs....
Thanks, I have more info which I will forward directly to you...
On Jan 31, 2007, at 8:40 AM, R. Kent Wenger wrote:
On Tue, 30 Jan 2007, Robert Mortensen wrote:
I'm having a problem with dagman on an all Windows XP pool. Basically
what happens, occasionally, is that a dagman job exits before
completing all nodes. It is then is restarted and it completes the
remaining nodes, but then hangs waiting, I think, for some "phantom"
node to complete. There are three problems:
1 - dagman appears to exit for no reason, with no errors in any logs
that I can find
2 - after recovering, dagman hangs after all the nodes have been
submitted and completed
3 - the delay in dagman recovering is nearly 1 hour
We're looking into this.
One thing that might help would be to also have the
and master.dag.lib.out files if you still have them.
Also, it would help if you increased the verbosity of the DAGMan
and sent the resulting dagman.out file when/if this happens again.
There are two separate verbosity controls (that control different
Please do the following:
- Add the setting '-debug 5' on your condor_submit_dag command line.
- Set the configuration macro DAGMAN_DEBUG to D_FULLDEBUG. You can do
this in a couple of ways:
- Put 'DAGMAN_DEBUG = D_FULLDEBUG' into an appropriate
- Set _CONDOR_DAGMAN_DEBUG to D_FULLDEBUG in your environment
- You can address number 3 by setting the
configuration macro to a value shorter than the default (which is
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
You can also unsubscribe by visiting
The archives can be found at either