[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Synchronizing concurrent jobs in Condor



On Thu, Jun 10, 2010 at 7:09 AM, ivailo penev <ivopenev@xxxxxx> wrote:
 Hi, colleagues!

I have multiple independent jobs which perform calculations with data read from a common data source (in my case - common Oracle database server). I am trying to start the jobs in a Condor pool in parallel. As the jobs are concurrent to the database, the simultaneous access decreases the parallel jobs' performance significantly. Is there any way (maybe command or macro) for synchronizing concurrent jobs, accessing common data source in Condor, without modifying the code of the jobs?

I was tempted to answer DAGMan but wanted to ask you some clarifying questions first.

Is your job flow, when it runs, something like:
  1. Gather data from the database (this part you want to serialize for all jobs)
  2. Process data for a long time (this part is okay to run in parallel)
  3. Return a result to the database (this part is also okay to run in parallel)
And you want step 1 to happen serially for all running jobs? If so: that's slightly more complicated than what DAGMan can handle IMO. Concurrency limits won't help you here either because a job consumes a resource counter and doesn't release it until it's done. So you can't "consume during step 1" and "release when you move to step 2" without making this a two-job DAG, which is probably not what you want.

It sounds like you'll need to serialize access for step 1 yourself. Depending on how wide your parallelism you can try the simple, but slightly prone to starvation, approach. Or you can concoct something more serious.

Of course, if your job flow isn't how I've described, than ignore this all together. :)

- Ian