[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] slow scheduling of dagman jobs



On Wed, 7 Sep 2011, Patty Bragger wrote:

I'm running into a performance issue of sorts with submitting dagman jobs.
When submitting a dagman job of say 100 nodes, I find that it takes quite a
wile for all 100 nodes to show up in the queue.  After an initial wait of
about 12 seconds, the nodes are added to the queue at a rate of about 7 per
second. The nodes have no dependencies on each other, they are completely
stand alone and could be submitted without using dag.  When I do submit jobs
without using dag, the jobs are added to the queue much faster, about
100/second.  I can get that submission rate whether submitting one job with
a "queue 100"  or submitting 100 separate jobs in one submit file.

Well, keep in mind that DAGMan is doing a separate condor_submit for each node. When I do that (outside of DAGMan) it's much slower than doing a single condor_submit that queues 100 jobs.

So I think you're basically seeing the overhead of a condor_submit call for every job versus a single condor_submit call.

Keep in mind that (at least with recent versions of DAGMan) you can queue multiple jobs in a single submit file (as long as they are all part of the same cluster). I'm pretty sure (but not 100% sure) that that feature was in 7.4.4. Of course, depending on exactly how you are using DAGMan, this may not be a good idea, but the option is there if one of your main goals is to get jobs into the queue as fast as possible.

Kent Wenger
Condor Team