[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] dagman job submit times




On Mar 18, 2013, at 1:14 PM, R. Kent Wenger wrote:

On Mon, 18 Mar 2013, Stephen Pietrowicz wrote:

We've got a pool of nodes totaling about 1000 cores.  We have a DAGman submit the jobs.   In the configuration, we have:

DAGMAN_MAX_SUBMITS_PER_INTERVAL=1000
DAGMAN_SUBMIT_DELAY=0
DAGMAN_USER_LOG_SCAN_INTERVAL=5

When we look at the log files, we're only seeing 23 or 23 jobs submitted per second, sometimes as low as 11-15 per second.  (This is all from information gathered from the .log file Condor generates).

Is this expected behavior, or is there something else we need to figure to have DAGman increase the submission rate?

I haven't run a test on this in a while, but I think the numbers you're seeing are pretty much what is expected.  The last time we tested this, the overhead of DAGMan itself was pretty minimal, and the submission rate largely depended on how long a condor_submit command took to run, which largely depends on how busy your schedd is.

There's simple way to test this -- just make a script that does, say, 100 consecutive condor_submits, without any kind of logic, and time how long that takes to run.  That should give you a good idea whether your DAGMan is causing any significant overhead.

Kent Wenger
CHTC Team

Would it be better to have something like this in one submit file:

universe = vanilla
executable = /path/to/my/executable
arguments = arg1 arg2 arg3
queue
arguments = newArg1 newArg2 newArg3
queue
changing the arguments for each of the 1000 jobs, and queuing them up separately?  If it works the way I think it would, we could do one condor_submit and have it queue everything from there.   I'd have to think of what the implications would be for our jobs, since we're using a diamond DAG right now to run a "preJob" to set everything up and then running the rest of the jobs (which rely on the preJob results).