[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_userprio's WeightedAccumulatedUsage



After examining Accountantnew.log, we think we see what's happening.

1) On a job's first billing cycle, the job is considered to be running in the node's partitionable slot. Since we're using the default SlotWeight (= Cpus), the partitionable slot has a weight equal to the number of unused cores, and the job gets charged accordingly.

103 Resource.slot1@node0050.tower-research.com@<10.xxx:9618?addrs=10.xxx-9618&noUDP&sock=1281881_a2e2_3> SlotWeight 28.000000

2) On the second billing cycle, the job is assigned to the dynamic slot. This job requested 1 CPU.

103 Resource.slot1_1@node0050.skae.tower-research.com@<10.xxx:9618?addrs=10.xxx-9618&noUDP&sock=1281881_a2e2_3> SlotWeight 1.000000

We deal with this problem by defining SLOT_WEIGHT as ifThenElse(SlotType == "Partitionable", 1, Cpus). This undercharges for the first billing cycle for multicore jobs, but they comprise only a small percentage of our jobs.

The second issue is that the length of the billing cycles is a function of NEGOTIATOR_INTERVAL and NEGOTIATOR_CYCLE_DELAY. We had reduced the former from the default of 60 seconds; hence the overcharge for jobs was approximately TotalCpus * NEGOTIATOR_INTERVAL = 28 * 30 = 560 seconds.

By reducing both of these negotiator parameters to 1 second, we can get reasonably accurate billing. So far, the NegotiatorRecentDaemonCoreDutyCycle is staying below 50%.

Thanks for everyone's help.

Jon


On Fri, Apr 7, 2017 at 10:39 AM, Jon Bernard <jonbernard@xxxxxxxxx> wrote:
Sorry, I should have mentioned that sooner: this is 8.5.8.

On Fri, Apr 7, 2017 at 10:10 AM, Greg Thain <gthain@xxxxxxxxxxx> wrote:
On 04/06/2017 05:21 PM, Jon Bernard wrote:
Hi Ben,

The values for WeightedAccumulatedUsage and AccumulatedUsage don't match, but they don't match for our other pools where there isn't this problem (or at least, not to this extent). The AccumulatedUsage is about the same as the sum of the wall times (though sometimes significantly less).

Jon:

What condor version is this? I know there were some bugs in this area that were fixed a few years ago.

-greg

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxx.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/