[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Recovering from failures of DAGs within DAGs



Hi Kent,

Thanks for your answer; I'll have a go at this.

Cheers,

Craig.

On Thu, 22 Dec 2005, R. Kent Wenger wrote:

> On Wed, 21 Dec 2005, Craig Robinson wrote:
>
> > We are developing a DAGMan application which will ideally use DAGs
> > within DAGs. We have seen in the Condor documentation that such
> > applications are supported. How are failures of internal DAGs dealt
> > with, and is there any easy way to recover from
> > this?
>
> Expanding on my earlier answer, there's an easy way to get the rescue
> DAGs to work right with retries.  In the top-level DAG in my example,
> just have the following as a POST script for the node that is the
> lower-level DAG:
>
>     #! /bin/csh -f
>     if (-e lower.dag.rescue) then
>       mv lower.dag lower.dag.orig
>       mv lower.dag.rescue lower.dag
>     endif
>
> That way, if the lower-level DAG fails, you'll end up actually retrying
> with the rescue DAG, which will start up from where the first try left
> off (the rescue DAG records which nodes were completed).
>
> Kent Wenger
> Condor Team
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>