[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] DAGman duplicating jobs on schedd restart
- Date: Thu, 3 Nov 2011 12:41:48 -0400
- From: Ian Chesal <ichesal@xxxxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] DAGman duplicating jobs on schedd restart
On Thursday, 3 November, 2011 at 9:38 AM, Christopher Martin wrote:
Whenever the schedd restarts we're getting duplicate jobs showing up in the queue. For example if we have a DAG like the following:
PARENT A CHILD B
PARENT B CHILD C D
Before the schedd restart, jobs A and B have completed and jobs C and D are queued. After the schedd restarts we then have C and D still queued but B has been added back into the queue as well. Is this a peculiarity of the DAG rescue or perhaps it could be a conflict with the dagman logs?
What does DAGMan say? Anything the dag log that might be helpful?
I suspect the most likely cause for the resubmission of B is that DAGMan can't determine that it completed successfully. Is there a log not in the job log that indicates that B completed?
Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools