[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Order in which dagman queues jobs
- Date: Thu, 05 Dec 2013 21:55:41 +0000
- From: Brian Candler <b.candler@xxxxxxxxx>
- Subject: [HTCondor-users] Order in which dagman queues jobs
From experimentation, it seems that dagman queues up jobs in the order
that they become ready. Is this true, and is there any way to change this?
Let me explain what I'm doing. I have a DAG has a number of independent
job threads, each of which is a linear chain of nodes. i.e. something
A1 -> B1 -> C1 -> D1
A2 -> B2 -> C2 -> D2
A3 -> B3 -> C3 -> D3
A1000 -> B1000 -> C1000 -> D1000
The 'A' jobs complete very quickly, each within a second or two; dagman
can't submit them into the queue fast enough. The B and C jobs are
relatively long-running and compute intensive, and the D jobs are quite
What I'm discovering from watching progress is:
- Even when some of the A jobs have completed (and therefore the related
B jobs are ready to run), dagman continues to submit all the remaining A
jobs before it starts to submit any B jobs. Therefore these
compute-heavy jobs don't start to run as soon as they might.
- Things move more or less in lock step (i.e. there's a phase when A
jobs are running, then B jobs are running, then C jobs are running etc)
- At the end, when the D jobs are running, because these are short the
queue empties out and again dagman can't submit jobs fast enough.
Obviously one thing I need to do is to get dagman to push jobs into the
queue faster, and I'm going to investigate some of the ideas at
However, in my case it would also be helpful if dagman queue up jobs in
a different order - for example, when an 'A' job completes then queue up
its corresponding 'B' job in preference to another 'A' job. This would
mix the workload better through the lifetime of the jobs, and also some
of the completed results would come out sooner.
I've read through
and can't find anything relevant. I will try setting categories (e.g.
max 500 'A' jobs, max 500 'B' jobs at any one time), but that's not
exactly what I'm looking for.