[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] [newbie question: using DAGman how can I restart a job that failed after that another script solve it]



Hi,

I'm trying to do some exercise with Condor to understand better how does this huge system work.
Let's say I have few jobs organized as it follows:
A -> B -> C -> F
           -> D -> E

(D depends on B)

Let's say now that B fails ... I don't want to retry immediately B with the command RETRY B <number of time> ....
I want to launch another script that will repair the problem an restart B.
I guess that I can work with the PRE and POST script.
Let's say that my POST script, launched after the execution of B, check the returned value and if there is a problem the script fix it but how can I tell to restart B ?
Do I have to create a new workflow of jobs like this ?
B ->C->F
  ->D->E

Thanks.