[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] dagman question



On Saturday, 28 July, 2012 at 8:13 AM, Rita wrote:
Lets say we have 5 jobs,

Job  A  A.condor 
Job  B  B.condor 
Job  C  C.condor 
Job  D  D.condor 
Job  E  E.condor 
Is it possible to have the entire dag fail if Job C (or whichever) fails? BTW, these are all independent jobs. 
It sort of depends on what you mean by "fail". A DAG "fails" if any node in the DAG fails to run successfully, but some nodes may have completed successfully before a failure is encountered. So a successfully run DAG is one where all the nodes run successfully to completion, otherwise it's a failed DAG by definition.

If that's all you want to test for, failure like that, then it's not hard. Just check the DAG output for the node status STATUS_ERROR to see if any node failed. I can't, off the top of my head, recall if the dag manager job itself exits with an error code of the DAG fails -- maybe it does? Hopefully it does. In which case you could check the job history for the scheduler to see what the job's ad says for the manager job too -- which might be easier than log file parsing.

If you want nothing else to run if a node in your list fails then you need to setup a parent-child relationship. For example: if you don't want anything else to run if C fails you want:

PARENT C CHILD A B  D  E

Now if C fails to run, nothing else will run because you've set them as a children of C and children only run if their parent is successful.

Regards,
- Ian

---
Ian Chesal

Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com
http://twitter.com/cyclecomputing