On Wed, 7 Sep 2011, Patty Bragger wrote:
I'm running into a performance issue of sorts with submitting dagman jobs. When submitting a dagman job of say 100 nodes, I find that it takes quite a wile for all 100 nodes to show up in the queue. After an initial wait of about 12 seconds, the nodes are added to the queue at a rate of about 7 per second. The nodes have no dependencies on each other, they are completely stand alone and could be submitted without using dag. When I do submit jobs without using dag, the jobs are added to the queue much faster, about 100/second. I can get that submission rate whether submitting one job with a "queue 100" or submitting 100 separate jobs in one submit file.
Well, keep in mind that DAGMan is doing a separate condor_submit for each node. When I do that (outside of DAGMan) it's much slower than doing a single condor_submit that queues 100 jobs.
So I think you're basically seeing the overhead of a condor_submit call for every job versus a single condor_submit call.
Keep in mind that (at least with recent versions of DAGMan) you can queue multiple jobs in a single submit file (as long as they are all part of the same cluster). I'm pretty sure (but not 100% sure) that that feature was in 7.4.4. Of course, depending on exactly how you are using DAGMan, this may not be a good idea, but the option is there if one of your main goals is to get jobs into the queue as fast as possible.
Kent Wenger Condor Team