[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Changing slotweight for few nodes in pool



Thanks Steve.

After reading your comment and going through the presentation [1] I understand whats the meaning of "Note that the only attributes valid within the SLOT_WEIGHT _expression_ are Cpus, Memory, and disk. This must the set to the same value on all machines in the pool."

This _expression_ is in use from long time I guess keeping the core count may impact the accounting group quotas that's why one is used.Â

Ran a simple test in lab where default slot_weight was cpus. i have two executor nodes, one where slotweight is defualt cpus user testold1 and testnew1 ran the job and on another node with slotweight = (cpus * 1.2) testold2 and testnew2 ran the job.Â

From two tests, I can see that users running on machine with higher slotweight always consume 4 resources on other hand with default slotweight resources in use are 3. Not sure why for testnew2 effective priority is lower than testnew1, it should be actually higher. are you talking about calculation of effective priority or totalÂ

# condor_userprio
Last Priority Update: Â8/14 07:01
                Effective  Priority  Res  Total Usage ÂTime Since
User Name            ÂPriority  ÂFactor  In Use (wghted-hrs) Last Usage
------------------------------ ------------ --------- ------ ------------ ----------
testold1@testdomain    500.40  1000.00   Â3     0.02   Â<now>
testold2@testdomain    501.49  1000.00   Â4     0.06   Â<now>
testnew1@testdomain    501.20  1000.00   Â3     0.05   Â<now>
testnew2@testdomain    500.50  1000.00   Â4     0.02  Â0+00:01
------------------------------ ------------ --------- ------ ------------ ----------
Number of users: 4 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â13 Â Â Â Â 0.15 Â Â0+23:54

In this statement "The other thing that can happen is that condor_userprio doesn't accurately show the effects of a floating point slot weight, it only reports integer resources used. but the underlying math is right".Â
are you referring to effective priority or resource in use. can you please explain it bit more.Â


On Wed, Aug 14, 2019 at 7:02 AM Steven C Timm <timm@xxxxxxxx> wrote:

We are doing something like what you are doing at Fermilab. Basically our slot-weightÂ

_expression_ charges the user by the CPU's or the number of 2GB memory chunks, whichever is higher. Âi.e. 1cpu 2 GB = 1, 1 cpu 3GB = 1.5, 1 cpu 4GB = 2, 2 cpu 2GB =2 , and so forth.

What I don't understand is why you would set the weight of the Partitionable slot to 1,Â

it should be set to how many cpus remaining in it at the time.


The trap that can happen is that if you have a lot of small submitters, sometimes the slot weight of the slots will be so big that a submitter with a low limit will never get a slot. In theory the negotiator could either hand an existing dynamic slot or the whole Partitionable slot to the schedd. In practice it is more the latter that we see. We see the effect that small submitters get frozen out sometimesÂ

but they recently put in a patch to fix most of the problem. Â


The other thing that can happen is that condor_userprio doesn't accurately show the effects of a floating point slot weight, it only reports integer resources used. but the underlying math is right.


Finally be sure that you never have the slot weight variable end up undefined, major craziness can happen then.


Steve Timm



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Vikrant Aggarwal <ervikrant06@xxxxxxxxx>
Sent: Tuesday, August 13, 2019 11:12:20 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Changing slotweight for few nodes in pool
Â
Hello Experts,Â

A gentle follow-up email.

On Mon, 12 Aug, 2019, 19:52 Vikrant Aggarwal, <ervikrant06@xxxxxxxxx> wrote:
Hello Experts,

We are using partitionable slots in our setup, we introduced some high mem nodes in our pool for which we want to charge more and we are charging based on cpu core usage. I am planning to multiply slotweight default value cpus with float value to increase the user priority which is used for charging.Â

From:

SLOT_WEIGHT = ifThenElse(SlotType == "Partitionable", 1, Cpus)

To: something like so few nodes will be having below slot_weight in pool

SLOT_WEIGHT = ifThenElse(SlotType == "Partitionable", 1, Cpus *1.2)

I read the following in HTCondor manual:

Enable use of the condor_negotiator-side resource consumption policy, allocating the job-requested number of cores to the dynamic slot, and use SLOT_WEIGHT to assess the user usage that will affect user priority by the number of cores allocated. Note that the only attributes valid within the SLOT_WEIGHT _expression_ are Cpus, Memory, and disk. This must the set to the same value on all machines in the pool.

If I am changing the slot_weight of few nodes in cluster am I inviting to unknown issues or limitations?

Thanks & Regards,
Vikrant Aggarwal
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/