[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Gracefully stopping DAGMAN




On Mar 26, 2010, at 11:36 AM, R. Kent Wenger wrote:

On Thu, 25 Mar 2010, Robert Mortensen wrote:

I have a situation where I submit a DAG where each node has a PRE and POST script, there are no parent/child relationships since each node is independent. The PRE script prepares the data for the node to use, the POST script post processes the data and marks the status of each node in a separate database. We have a script that allows our users to cancel the run (a run may have thousands of nodes and take several hours to complete). The question is, how can I stop the DAG but have the post script of each node that has started running be run?

Currently, I put a "KILL" file in the directory the dag is run from, then the PRE scripts check for this file and exit with a non- zero result. This keeps other nodes that have already run from being added into the queue. Then I condor_rm each of the idle and running nodes, this evicts them and runs their POST scripts (which is what I need). I then just wait for the DAG to finish. If there are a lot of unrun nodes, I must wait for all their PRE scripts (that do nothing) to run, which is a waste and can take a while.

Basically I need to signal dagman to stop running PRE scripts and submitting nodes, condor_rm all submitted nodes, and run any pending POST scripts. Anyway to do this?

BTW, I'm running on Windows with 7.4.1.....

Hmm. I can't think of a fairly easy way to do exactly what you want to do. If you condor_rm the DAGMan job, it will rm all of the node jobs, but it won't run any of the POST scripts.

I'm thinking that the real solution to this problem is to add a configuration knob to tell DAGMan exactly what you want it to do when you condor_rm it -- so you could tell it, for example, to remove jobs in the queue, but still go ahead and run the POST scripts. How does that sound?


I like this idea. I recently developed a workflow that stages and unstages data to a web server. During the development, it would have been very handy to have a "do this on condor_rm" knob so that unstaging would occur when I stopped my DAG prematurely.

    Craig

Kent Wenger
Condor Team
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

--
Craig A. Struble, Ph.D. | 369 Cudahy Hall  | Marquette University
Associate Professor of Computer Science    | (414)288-3783
Director, Master of Bioinformatics Program | (414)288-5472 (fax)
http://www.mscs.mu.edu/~cstruble | craig.struble@xxxxxxxxxxxxx