[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Best Practices: how to handle a DAG with an unknown number of jobs in one step...



On Thu, 25 Feb 2010, Daniel Pittman wrote:

G'day.

We have a regular set of jobs that I am a bit puzzled about how best to model in Condor. Specifically, the model is this:

Step 1: Fetch data from our collection system.
Step 2: Process that data into an unknown number of "packs".
Step 3: Generate one report for each pack at step 2.

Now, stage one and two are pretty easy, but I would ideally like to have a single DAG that would encapsulate the whole process.

What gives me trouble is working out how to get step 3 to generate one condor job for each pack — since they can run in parallel trivially, and we generally only run one or two of the overall jobs at any given time.[1]

I can think of two ways to approach this:

1) Use a single submit file for step 3 that submits however many jobs
you need.  (This may not be possible depending on how much the jobs
have to differ in their arguments, etc., because for DAGMan all of the jobs have to be in the same cluster.)

2) Use a nested DAG for step 3 with one node for each "pack".

Anyhow, here's the explanation of the two approaches (this is assuming you're running a recent DAGMan (e.g., 7.4.1 or later) -- if you're using something much older than that, this stuff will be harder to do).


For option 1, all you have to do is have the node job for the process
step (or its post script) write the submit file for the report step. In recent versions of DAGMan, the submit file for a node job doesn't have to exist until right before that job is actually submitted, so you can do this. In an older version, you'll have to have a "place-holder" submit file in existance ahead of time, and overwrite it (but you must use the same log file). So your DAG would look like this:

  Job fetch fetch.sub
  Job process process.sub
  Job report report.sub
  Parent fetch Child process
  Parent process Child report

and the process step would have to write report.sub.


For option 2, your DAG file would look like this:

  Job fetch fetch.sub
  Job process process.sub
  Subdag External report report.dag
  Parent fetch Child process
  Parent process Child report

and the process job would have to write report.dag. (For this to work, you'll have to have a dummy report.dag file in place at submit time, but you can overwrite it. This restriction should go away soon.) Unless
you can re-use the same submit file for all "packs" using the VARS feature
(I would guess you probably can), the process job would also have to write the submit files for the report.dag nodes.

Option 2 is a little more work, but is a more general solution.

Kent Wenger
Condor Team