[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] execute host priority



On Thu, Feb 8, 2018 at 4:20 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
> On 2/8/2018 11:40 AM, Larry Martell wrote:
>> I would not have thought giving one execute host priority over another
>> would be that difficult. But I have googled this and I see it's been
>> asked quite a few times, with no clear answer ever given.
>
> Reading the below, it sounds you have two machines, lets call them machineA and machineB.  You want to prefer machineA, such that you will only use machineB once machineA is out of cores.

Correct.

> I'll assume you are running HTCondor v8.6.x or later.

Using 8.6.8.

> I suggest you append the following into the condor_config[.local] on your central manager and then do a condor_reconfig:
>
>     # Sort jobs first according to machine rank.  Next prefer
>     # slots that are unclaimed (idle) over slots that are already
>     # running a job.
>     # And finally, prefer to use idle slots on machineA.xxx.com before
>     # using others.
>     NEGOTIATOR_PRE_JOB_RANK = (10000000 * My.Rank) + \
>       (1000000 * (RemoteOwner =?= UNDEFINED)) + \
>       (100 * Machine =?= "machineA.somewhere.com")
>     # If using partitionable slots, fill depth first.  See http://tinyurl.com/y75k3k7p
>     NEGOTIATOR_DEPTH_FIRST = True

Thanks - this seems to be doing what I want!

> You can learn more about the "NEGOTIATOR_PRE_JOB_RANK" knob, and any other knob, by looking for the knob name in the index of the HTCondor Manual.

Yes, I read that description, but how could someone, from reading
this, possibly know that the setting would be the expression you
provided?

NEGOTIATOR_PRE_JOB_RANK
Resources that match a request are first sorted by this expression. If
there are any ties in the rank of the top choice, the top resources
are sorted by the user-supplied rank in the job ClassAd, then
byNEGOTIATOR_POST_JOB_RANK, then by PREEMPTION_RANK (if the match
would cause preemption and there are still any ties in the top
choice). MY refers to attributes of the machine ClassAd and TARGET
refers to the job ClassAd. The purpose of the pre job rank is to allow
the pool administrator to override any other rankings, in order to
optimize overall throughput. For example, it is commonly used to
minimize preemption, even if the job rank prefers a machine that is
busy. If undefined, this expression has no effect on the ranking of
matches.

> Be warned, I didn't take the time to explicitly test the above, but I think the odds are good it
> will do what you asked.
>
> p.s. TJ asked if your execute nodes are configured to use partitionable slots or static slots.  Static slots is currently the default, so if you do not know what this means, the answer is probably static slots. :) Info on static vs partitionable slots in the Manual is at
>   http://tinyurl.com/y8mwzykz
> I think the above suggestion will work regardless of a static or partitionable slot configuration, but telling folks this up front whenever asking about scheduling configuration questions is helpful.
>
> hope this helps,

Hugely! Thanks again.


>> On Wed, Feb 7, 2018 at 12:26 PM, Larry Martell <larry.martell@xxxxxxxxx> wrote:
>>> How do I use this setting? From googling I tried this on the machine I
>>> want to have lower priority:
>>>
>>> NEGOTIATOR_PRE_JOB_RANK = - Memory
>>>
>>> But it had no effect - that machine was still chosen first.
>>>
>>> On Thu, Jan 18, 2018 at 5:59 PM, John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
>>>> Use NEGOTIATOR_PRE_JOB_RANK if you want to choose which machine is given to
>>>> the schedd first.
>>>>
>>>>
>>>>
>>>> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
>>>> Of Larry Martell
>>>> Sent: Thursday, January 18, 2018 4:21 PM
>>>> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
>>>> Subject: Re: [HTCondor-users] execute host priority
>>>>
>>>>
>>>>
>>>> Do it put that in just one host (the one I want it to use first)?
>>>>
>>>>
>>>>
>>>> On Thu, Jan 18, 2018 at 5:09 PM John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
>>>>
>>>> This will happen naturally if your execute nodes are configured to use
>>>> partitionable slots. with CLAIM_PARTITIONABLE_LEFTOVERS=true in the schedd.
>>>>
>>>> The schedd will get handed the entire partitionable slot, and it will split
>>>> it up and start as many jobs on it as it can before it will try to cuse the
>>>> other partitionable slot.
>>>>
>>>> If there are more than 136 jobs in the queue, then the second execute
>>>> machine will end up getting used for the remaining jobs.
>>>>
>>>> -tj
>>>>
>>>> -----Original Message-----
>>>> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
>>>> Of Larry Martell
>>>> Sent: Thursday, January 18, 2018 6:14 AM
>>>> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
>>>> Subject: [HTCondor-users] execute host priority
>>>>
>>>> I have 2 execute hosts defined each with 136 cores - how can I make my
>>>> submit host use all the cores on one before starting to use the other?