[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Execute last DAGMan job as soon as possible

Iâd suggest double-checking your default prio factor value â as I recall it was only raised to 1,000 in the 8.4 release, and if youâre using 8.2, it might only be 100 (if Iâm remembering correctly), and if your jobs are racking up a lot of slot time across many machines the final job may still have a higher EUP even with a priority factor of 1.


Perhaps if you specify also a dummy accounting_group_user for the end jobs, theyâd wind up in a different usage basket than the rest of the DAG and wouldnât be penalized for the large usage it incurred?


You can use condor_userprio to check on all this.


                -Michael Peleltier.


From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Szabolcs HorvÃtth
Sent: Wednesday, November 30, 2016 12:18 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Execute last DAGMan job as soon as possible


Hi Michael,

Thanks for the tip! I tried setting up group accounting and it solved most of my problems, although it still takes more time to start the end jobs than I'd expect.

We have a negotiation cycle every 30 seconds but it takes much longer time to match slots to these jobs (around 10-15 minutes), even though there are idle slots that get matched to other jobs.

Maybe there are some claims hanging on to these slots?





On Tue, Nov 29, 2016 at 7:12 PM, Michael Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx> wrote:



It sounds like what youâre looking for is accounting groups. Youâd set an accounting group which has a very low priority factor, i.e., âgroup_urgentâ and assign your final node to that group:


GROUP_NAMES = group_urgent

GROUP_PRIO_FACTOR_group_urgent  = 1.0



In your final DAG node which handles the post-processing, youâd set the following in the submit description for it:


Accounting_group = group_urgent


And then that final job would be very likely to be the first in line to get the next matching machine resource, because its effective user priority (real user priority times priority factor of 1.0) would be likely to be lower than other jobs using the default_prio_factor of 1000 (v8.4).


The priority set in the submit âpriorityâ setting only applies to jobs from the same owner. Youâd use this, for example, if you had a pile of 10,000 runs waiting in the queue, but needed to get a few validation runs through before all of those 10,000 are finished.


                -Michael Pelletier.



From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Szabolcs HorvÃtth
Sent: Tuesday, November 29, 2016 5:07 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Execute last DAGMan job as soon as possible



What is the fastest way to start a job in a Condor pool where machine rank, user priority factor and job priority varies a lot?

We use DAGMan graphs where the last job depends on the execution of all previously submitted DAG jobs. This last job does some post processing on the data generated by the dag, and it can take some time, so its not something that I'd like to execute on the Scheduler machine. But it would be important to start this post-process as fast as possible, regardless of the priority of the submitting user. I tried setting high machine rank and high job priority but I still see lots of these jobs wait while other jobs get started. The best solution would be to skip matchmaking altogether and execute the job right away but I didn't find a reliable way to do that.



HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: