[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] dagman question
On Saturday, 28 July, 2012 at 8:13 AM, Rita wrote:
Lets say we have 5 jobs,
Job A A.condor
Job B B.condor
Job C C.condor
Job D D.condor
Job E E.condor
Is it possible to have the entire dag fail if Job C (or whichever) fails? BTW, these are all independent jobs.
It sort of depends on what you mean by "fail". A DAG "fails" if any node in the DAG fails to run successfully, but some nodes may have completed successfully before a failure is encountered. So a successfully run DAG is one where all the nodes run successfully to completion, otherwise it's a failed DAG by definition.
If that's all you want to test for, failure like that, then it's not hard. Just check the DAG output for the node status STATUS_ERROR to see if any node failed. I can't, off the top of my head, recall if the dag manager job itself exits with an error code of the DAG fails -- maybe it does? Hopefully it does. In which case you could check the job history for the scheduler to see what the job's ad says for the manager job too -- which might be easier than log file parsing.
If you want nothing else to run if a node in your list fails then you need to setup a parent-child relationship. For example: if you don't want anything else to run if C fails you want:
PARENT C CHILD A B D E
Now if C fails to run, nothing else will run because you've set them as a children of C and children only run if their parent is successful.
Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools