[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_userprio's WeightedAccumulatedUsage



Hi Ben,

The values for WeightedAccumulatedUsage and AccumulatedUsage don't match, but they don't match for our other pools where there isn't this problem (or at least, not to this extent). The AccumulatedUsage is about the same as the sum of the wall times (though sometimes significantly less).

(After I had looked at the source, I had expected the values to be the same, too, but they aren't the same for any of our users, nearly all of whom ask for only one slot.)

Yes, I've confirmed that the usage is being reset. No other jobs are executing for this user.

The slots are correctly reporting their slot weight (i.e., 1). Our job wrapper logs the values of the slot's Cpus attribute, as well, and this is 1 for these jobs (and reports the correct number of cpus for jobs requesting more than 1 cpu).

This pool is our largest, with 25K cores. Our smallest pools, with about 1K cores each, show the problem to a lesser degree, and our 10K-core pool shows it to a degree greater than the small pools, but not nearly as bad as this one (and not so badly that we care right now). The configuration of the these other pools is mostly the same as the largest pool. I dumped the configs and compared them, but didn't see anything that seemed like it could affect this.

Thanks,
Jon


On Thu, Apr 6, 2017 at 5:32 PM, Ben Cotton <ben.cotton@xxxxxxxxxxxxxxxxxx> wrote:
Hi Jon,

Looking at the source, it looks like the WeightedAccumulatedUsage
should use the slot weight as Todd said. If you do a `condor_userprio
-l`, does the AccumulatedUsageN match the WeightedAccumulatedUsageN
for the user in question?

Have you checked that the `condor_userprio -resetusage` is resetting
the usage? Are there any other jobs executing from that user?

Have you checked that the slots are correctly reporting their slot
weight? (e.g. with `condor_status -af SlotWeight Name`)


Thanks,
BC

On Thu, Apr 6, 2017 at 5:02 PM, Jon Bernard <jonbernard@xxxxxxxxx> wrote:
> Hi Todd,
>
> SLOT_WEIGHT = Cpus, but the number of cores per slot is only 1.
>
> Jon
>
>
>
> On Thu, Apr 6, 2017 at 3:05 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx>
> wrote:
>>
>> On 4/6/2017 1:45 PM, Jon Bernard wrote:
>>>
>>> Hi all,
>>>
>>> I'm seeing some strange numbers for WeightedAccumulatedUsage from one of
>>> our pools.
>>>
>>> Our test case is to submit 1000 jobs which sleep 30 seconds. The total
>>> remotewallclocktime for all the jobs is 30,219 seconds. However, the
>>> usage for the user reported by condor_userprio for these jobs is on the
>>> order of 600,000 seconds.
>>>
>>> For jobs which sleep 0 seconds, condor_userprio reports usage of 300,000
>>> to 600,000 seconds, as compared to about 200 seconds of walltime.
>>>
>>> The test script is essentially
>>>
>>> condor_userprio -resetusage <user>
>>> condor_submit sleep30
>>> clusterid=$(condor_q -af clusterid | head -n1)
>>> condor_wait -num 1000 /tmp/$clusterid.log
>>> condor_history -af remotewallclocktime -limit 1000 | awksum
>>> condor_userprio -allusers -const 'name == <user>' -af
>>> WeightedAccumulatedUsage
>>>
>>> Is there a configuration macro which might be affecting this?
>>>
>>> Thanks,
>>> Jon
>>>
>>
>> Hi Jon,
>>
>> What is the value of config knob SLOT_WEIGHTÂ ?
>>
>> By default, SLOT_WEIGHT = Cpus
>>
>> IIRC, the "Weighted" prefix in WeightedAccumulatedUsage means it takes the
>> SLOT_WEIGHT into account. ÂSo if you are using the default SLOT_WEIGHT =
>> Cpus, then I would expect to see the results you got above if your sleep
>> jobs ran on a lot of 20 core slots, i.e. slots where Cpus=20. (since 600k
>> seconds / 20 = 30k)
>>
>> Hope the above helps
>> Todd
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with
>> a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/



--
Ben Cotton
Technical Marketing Manager

Cycle Computing
Better Answers. Faster.

http://www.cyclecomputing.com
twitter: @cyclecomputing
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/