[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Documentation of DAG priorities

I would like to raise something which is unclear in the documentation, and that's around the idea of "priority".

At first glance it seems that there are two distinct and unrelated priorities.

(1) The job priority, which is set in the submit file with "priority = XXX" or using the condor_prio command line tool, and is reflected in the JobPrio ClassAd attribute.

When there is a choice of jobs in the queue to be run next, for the same user/accounting group, then the negotiator will pick the one with the highest priority.


"priority = <integer>
An HTCondor job priority can be any integer, with 0 being the default. Jobs with higher numerical priority will run before jobs with lower numerical priority. Note that this priority is on a per user basis. One user with many jobs may use this command to order his/her own jobs, and this will have no effect on whether or not these jobs will run ahead of another user's jobs."

(2) The DAG node priority, which is set using the "PRIORITY <node> <value>" statement in the DAG file.

When condor_dagman has a choice of which jobs to submit next into the queue, it chooses the one with the highest priority.


"The node priority affects the order in which nodes that are ready at the same time will be submitted. Note that node priority does not override the DAG dependencies.

Node priority is mainly relevant if node submission is throttled via the -maxjobs or -maxidle command-line arguments or the DAGMAN_MAX_JOBS_SUBMITTED or DAGMAN_MAX_JOBS_IDLE configuration variables."

Now, the documentation doesn't say at this point if the DAG node priority is related to the job priority. However, testing shows that it is (see end of message). What appears to happen is that the DAG node priority is also being used as the job priority. Not only that, but the DAG node priority *overrides* any job priority set explicitly in the submit file.

On reading the documentation again, there is a hint of this behaviour where it says later on:

"There is no way to change the priority in the submit description file for a job, as DAGMan will override any priority command placed in a submit description file."

But until that point, I couldn't find any indication that the DAG node priority was in any way related to the job priority.

Where this becomes important is where I've made a DAG with explicit node priorities - influencing the order in which things are run to optimise throughput - but then I decide that I want one DAG to take priority over another. It seems now that I have to apply offsets to all the DAG node priorities.

I don't really have a problem with this behaviour - indeed it would probably be confusing to have two distinct priorities - but I think it could be made a lot clearer in the documentation for condor_dagman. The following would do the trick:

"The PRIORITY keyword assigns a priority to a DAG node. The syntax for PRIORITY is

PRIORITY JobName PriorityValue

The node priority affects the order in which nodes that are ready at the same time will be submitted, and it also sets the job's priority in the queue, i.e. its JobPrio attribute. Note that node priority does not override the DAG dependencies."



This is tested with condor 8.2.4 under Ubuntu 12.04, with a single CPU VM so that only one job runs at a time. Here are the files:

==> pri.dag <==
JOB A pri0.sub

JOB B pri1.sub
JOB C pri2.sub
JOB D pri3.sub


==> pri0.sub <==
executable = /bin/sleep
arguments = 30
priority = 0

==> pri1.sub <==
executable = /bin/sleep
arguments = 120
priority = 1

==> pri2.sub <==
executable = /bin/sleep
arguments = 120
priority = 2

==> pri3.sub <==
executable = /bin/sleep
arguments = 120
priority = 3

What happens when I submit this DAG is that after node A, node C runs; and it runs with priority 30.

$ condor_q
18223.0ÂÂ brianÂÂÂÂÂÂÂÂÂ 10/5Â 09:57ÂÂ 0+00:01:54 RÂ 0 0.3Â condor_dagman -f -
18225.0ÂÂ brianÂÂÂÂÂÂÂÂÂ 10/5Â 09:59ÂÂ 0+00:00:02 RÂ 30 0.0Â sleep 120
18226.0ÂÂ brianÂÂÂÂÂÂÂÂÂ 10/5Â 09:59ÂÂ 0+00:00:00 IÂ 20 0.0Â sleep 120
18227.0ÂÂ brianÂÂÂÂÂÂÂÂÂ 10/5Â 09:59ÂÂ 0+00:00:00 IÂ 10 0.0Â sleep 120

$ condor_q -run dag
18223.0ÂÂ brianÂÂÂÂÂÂÂÂÂ 10/5Â 09:57ÂÂ 0+00:02:02 foo.example.com
18225.0ÂÂÂ |-CÂÂÂÂÂÂÂÂÂÂ 10/5Â 09:59ÂÂ 0+00:00:10 slot1@xxxxxxxxxxxxxxx

$ condor_q -long |grep JobPrio
JobPrio = 10
JobPrio = 20
JobPrio = 0
JobPrio = 30