[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Gracefully stopping DAGMAN



I have a situation where I submit a DAG where each node has a PRE and POST script, there are no parent/child relationships since each node is independent. The PRE script prepares the data for the node to use, the POST script post processes the data and marks the status of each node in a separate database. We have a script that allows our users to cancel the run (a run may have thousands of nodes and take several hours to complete). The question is, how can I stop the DAG but have the post script of each node that has started running be run?

Currently, I put a "KILL" file in the directory the dag is run from, then the PRE scripts check for this file and exit with a non-zero result. This keeps other nodes that have already run from being added into the queue. Then I condor_rm each of the idle and running nodes, this evicts them and runs their POST scripts (which is what I need). I then just wait for the DAG to finish. If there are a lot of unrun nodes, I must wait for all their PRE scripts (that do nothing) to run, which is a waste and can take a while.

Basically I need to signal dagman to stop running PRE scripts and submitting nodes, condor_rm all submitted nodes, and run any pending POST scripts. Anyway to do this?

BTW, I'm running on Windows with 7.4.1.....

Thanks,
Bob Mortensen