[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Execute last DAGMan job as soon as possible



Hi Michael,

Thanks for the tip! I tried setting up group accounting and it solved most of my problems, although it still takes more time to start the end jobs than I'd expect.
We have a negotiation cycle every 30 seconds but it takes much longer time to match slots to these jobs (around 10-15 minutes), even though there are idle slots that get matched to other jobs.
Maybe there are some claims hanging on to these slots?

Cheers,
Szabolcs

On Tue, Nov 29, 2016 at 7:12 PM, Michael Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx> wrote:

Hello,

Â

It sounds like what youâre looking for is accounting groups. Youâd set an accounting group which has a very low priority factor, i.e., âgroup_urgentâ and assign your final node to that group:

Â

GROUP_NAMES = group_urgent

GROUP_PRIO_FACTOR_group_urgent Â= 1.0

GROUP_AUTOREGROUP = True

Â

In your final DAG node which handles the post-processing, youâd set the following in the submit description for it:

Â

Accounting_group = group_urgent

Â

And then that final job would be very likely to be the first in line to get the next matching machine resource, because its effective user priority (real user priority times priority factor of 1.0) would be likely to be lower than other jobs using the default_prio_factor of 1000 (v8.4).

Â

The priority set in the submit âpriorityâ setting only applies to jobs from the same owner. Youâd use this, for example, if you had a pile of 10,000 runs waiting in the queue, but needed to get a few validation runs through before all of those 10,000 are finished.

Â

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ -Michael Pelletier.

Â

Â

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Szabolcs HorvÃtth
Sent: Tuesday, November 29, 2016 5:07 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Execute last DAGMan job as soon as possible

Â

Hi,

What is the fastest way to start a job in a Condor pool where machine rank, user priority factor and job priority varies a lot?

We use DAGMan graphs where the last job depends on the execution of all previously submitted DAG jobs. This last job does some post processing on the data generated by the dag, and it can take some time, so its not something that I'd like to execute on the Scheduler machine. But it would be important to start this post-process as fast as possible, regardless of the priority of the submitting user. I tried setting high machine rank and high job priority but I still see lots of these jobs wait while other jobs get started. The best solution would be to skip matchmaking altogether and execute the job right away but I didn't find a reliable way to do that.

Cheers,

Szabolcs


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/