On Fri, Jul 23, 2010 at 9:49 AM, Timothy St. Clair <tstclair@xxxxxxxxxx>
If the job is broken into a DAG where jobA (accessDB) has concurrency
limits, and jobB (operate) has no limits, I believe this should work.
You might need to take this a little further: the jobA part of the DAG should modify the machine's class ad when it runs successfully to tag the machine and indicate that the data has been staged on this machine. The jobB jobs should make the existence of this tag part of their requirements _expression_ so they only run on machines where data has been staged.
You'll end up with jobB jobs possibly running where no data has been staged unless you run enough jobA jobs to touch every machine in your pool (which in itself is a non-trivial thing to enforce in Condor).
Depending on how shared the data is between your different jobs, or how big your runs are, you don't even need DAGs here. You wouldn't want to have to wait for *all* of the jobA jobs to finish before you started running the jobB jobs -- you just don't want to run jobB jobs where jobA jobs have never run before.
You could just run two classes of jobs:
1. Concurrency-limited jobs that cache data and tag machines to indicate a visit, "producer" jobs;
2. And "consumer" jobs that require cached data on machines before they run.
You run lots of producer jobs and they slowly populate data on the machines. And as data is populated, more and more consumer jobs run. It all depends on how common your dataset is between consumer jobs I suppose.
You'll also need to consider a third class of job: the "cleanup" job that can be used to remove old data from the machines and untag the boxes.