[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Changing slotweight for few nodes in pool



We are doing something like what you are doing at Fermilab. Basically our slot-weight 

_expression_ charges the user by the CPU's or the number of 2GB memory chunks, whichever is higher.  i.e. 1cpu 2 GB = 1, 1 cpu 3GB = 1.5, 1 cpu 4GB = 2, 2 cpu 2GB =2 , and so forth.

What I don't understand is why you would set the weight of the Partitionable slot to 1, 

it should be set to how many cpus remaining in it at the time.


The trap that can happen is that if you have a lot of small submitters, sometimes the slot weight of the slots will be so big that a submitter with a low limit will never get a slot. In theory the negotiator could either hand an existing dynamic slot or the whole Partitionable slot to the schedd.  In practice it is more the latter that we see.  We see the effect that small submitters get frozen out sometimes 

but they recently put in a patch to fix most of the problem.  


The other thing that can happen is that condor_userprio doesn't accurately show the effects of a floating point slot weight, it only reports integer resources used. but the underlying math is right.


Finally be sure that you never have the slot weight variable end up undefined, major craziness can happen then.


Steve Timm



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Vikrant Aggarwal <ervikrant06@xxxxxxxxx>
Sent: Tuesday, August 13, 2019 11:12:20 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Changing slotweight for few nodes in pool
 
Hello Experts, 

A gentle follow-up email.

On Mon, 12 Aug, 2019, 19:52 Vikrant Aggarwal, <ervikrant06@xxxxxxxxx> wrote:
Hello Experts,

We are using partitionable slots in our setup, we introduced some high mem nodes in our pool for which we want to charge more and we are charging based on cpu core usage. I am planning to multiply slotweight default value cpus with float value to increase the user priority which is used for charging. 

From:

SLOT_WEIGHT = ifThenElse(SlotType == "Partitionable", 1, Cpus)

To: something like so few nodes will be having below slot_weight in pool

SLOT_WEIGHT = ifThenElse(SlotType == "Partitionable", 1, Cpus *1.2)

I read the following in HTCondor manual:

Enable use of the condor_negotiator-side resource consumption policy, allocating the job-requested number of cores to the dynamic slot, and use SLOT_WEIGHT to assess the user usage that will affect user priority by the number of cores allocated. Note that the only attributes valid within the SLOT_WEIGHT _expression_ are Cpus, Memory, and disk. This must the set to the same value on all machines in the pool.

If I am changing the slot_weight of few nodes in cluster am I inviting to unknown issues or limitations?

Thanks & Regards,
Vikrant Aggarwal