[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] dagman job submit times



On Mon, 18 Mar 2013, Stephen Pietrowicz wrote:

We've got a pool of nodes totaling about 1000 cores. We have a DAGman submit the jobs. In the configuration, we have:

DAGMAN_MAX_SUBMITS_PER_INTERVAL=1000
DAGMAN_SUBMIT_DELAY=0
DAGMAN_USER_LOG_SCAN_INTERVAL=5

When we look at the log files, we're only seeing 23 or 23 jobs submitted per second, sometimes as low as 11-15 per second. (This is all from information gathered from the .log file Condor generates).

Is this expected behavior, or is there something else we need to figure to have DAGman increase the submission rate?

I haven't run a test on this in a while, but I think the numbers you're seeing are pretty much what is expected. The last time we tested this, the overhead of DAGMan itself was pretty minimal, and the submission rate largely depended on how long a condor_submit command took to run, which largely depends on how busy your schedd is.

There's simple way to test this -- just make a script that does, say, 100 consecutive condor_submits, without any kind of logic, and time how long that takes to run. That should give you a good idea whether your DAGMan is causing any significant overhead.

Kent Wenger
CHTC Team