[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] nested dags and splices



On Tue, 3 Feb 2009, Leandro Martelli wrote:

Dear All,

I have the following situation;

P1: Program 1 runs and produces n files;
P2: Program 2 is then executed for each individual file and produces
output for them;
P3: Program 3 sums the outputs of all P2 executions (therefore depends
on all P2 executions);


I tried with nested dags and I'm checking now splices. With nested dags
(for step P2) I noticed that the file (the nested one - P2 in this case)
must be already written, but in my case it's going to be written only
after P1 completes (n is not known until the end of P1). As far as I
understand, splices work the same way, where the complete flow must be
fully defined at start.

So you're basically trying to use nested DAGs to deal with the fact that
you don't know n for P2 ahead of time, right?  And what you want to do is
have P1 write the DAG that will run the P2 instances?

I'm assuming that you're running a pretty recent version of DAGMan (at least 7.1.x). Also, does your top-level DAG look something like this?:

    JOB P1 P1.sub
    SUBDAG EXTERNAL P2 P2.dag
    JOB P3 P3.sub
    PARENT P1 CHILD P2
    PARENT P2 CHILD P3

Is there any other way or additional tool where I could dynamically
configure my workflow?

Here's an idea that I think will work (but I haven't actually tried it):

1) Write a "dummy" P2.dag before you submit the top-level DAG (this is
necessary so that the nested DAG node's log file will be defined at the beginning of the top-level DAG run).

2) P1 overwrites P2.dag.

3) Have a POST script for the P1 node that overwrites the P2.dag.condor.sub file:

    condor_submit_dag -f -no_submit P2.dag

I think that should do what you want.

(Note: we're working on changes that will make this kind of thing much easier in the future, but they're not fully implemented yet.)

Kent Wenger
Condor Team