[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Order in which dagman queues jobs



From experimentation, it seems that dagman queues up jobs in the order that they become ready. Is this true, and is there any way to change this?

Let me explain what I'm doing. I have a DAG has a number of independent job threads, each of which is a linear chain of nodes. i.e. something like this:

A1 -> B1 -> C1 -> D1
A2 -> B2 -> C2 -> D2
A3 -> B3 -> C3 -> D3
...
A1000 -> B1000 -> C1000 -> D1000

The 'A' jobs complete very quickly, each within a second or two; dagman can't submit them into the queue fast enough. The B and C jobs are relatively long-running and compute intensive, and the D jobs are quite short.

What I'm discovering from watching progress is:

- Even when some of the A jobs have completed (and therefore the related B jobs are ready to run), dagman continues to submit all the remaining A jobs before it starts to submit any B jobs. Therefore these compute-heavy jobs don't start to run as soon as they might.

- Things move more or less in lock step (i.e. there's a phase when A jobs are running, then B jobs are running, then C jobs are running etc)

- At the end, when the D jobs are running, because these are short the queue empties out and again dagman can't submit jobs fast enough.

Obviously one thing I need to do is to get dagman to push jobs into the queue faster, and I'm going to investigate some of the ideas at https://www-auth.cs.wisc.edu/lists/htcondor-users/2013-August/msg00002.shtml

However, in my case it would also be helpful if dagman queue up jobs in a different order - for example, when an 'A' job completes then queue up its corresponding 'B' job in preference to another 'A' job. This would mix the workload better through the lifetime of the jobs, and also some of the completed results would come out sooner.

Any pointers?

I've read through
http://research.cs.wisc.edu/htcondor/manual/v8.0/2_10DAGMan_Applications.html
http://research.cs.wisc.edu/htcondor/manual/v8.0/condor_submit_dag.html
and can't find anything relevant. I will try setting categories (e.g. max 500 'A' jobs, max 500 'B' jobs at any one time), but that's not exactly what I'm looking for.

Thanks,

Brian.