[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Execute last DAGMan job as soon as possible



It sounds like what you’re looking for is accounting groups. You’d set an accounting group which has a very low priority factor, i.e., “group_urgent” and assign your final node to that group:


GROUP_NAMES = group_urgent

GROUP_PRIO_FACTOR_group_urgent  = 1.0



In your final DAG node which handles the post-processing, you’d set the following in the submit description for it:


Accounting_group = group_urgent


And then that final job would be very likely to be the first in line to get the next matching machine resource, because its effective user priority (real user priority times priority factor of 1.0) would be likely to be lower than other jobs using the default_prio_factor of 1000 (v8.4).


The priority set in the submit “priority” setting only applies to jobs from the same owner. You’d use this, for example, if you had a pile of 10,000 runs waiting in the queue, but needed to get a few validation runs through before all of those 10,000 are finished.


                -Michael Pelletier.



From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Szabolcs Horvátth
Sent: Tuesday, November 29, 2016 5:07 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Execute last DAGMan job as soon as possible



What is the fastest way to start a job in a Condor pool where machine rank, user priority factor and job priority varies a lot?

We use DAGMan graphs where the last job depends on the execution of all previously submitted DAG jobs. This last job does some post processing on the data generated by the dag, and it can take some time, so its not something that I'd like to execute on the Scheduler machine. But it would be important to start this post-process as fast as possible, regardless of the priority of the submitting user. I tried setting high machine rank and high job priority but I still see lots of these jobs wait while other jobs get started. The best solution would be to skip matchmaking altogether and execute the job right away but I didn't find a reliable way to do that.