Re: [HTCondor-users] Total load average using multiple partitionable slots

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

Hello Jonathan,

One of my clusters is currently running on full load with partitionable slots.

OS: CentOS 7.6

HTCondor: 8.8.0 Jan 03 2019 BuildID: 457757 PackageID: 8.8.0-1

Partitioning config:

NUM_SLOTS = 1

NUM_SLOTS_TYPE_1 = 1

SLOT_TYPE_1 = auto

SLOT_TYPE_1_PARTITIONABLE = True

One (of 32 total) machine:

Compact result:

I personally don’t use condor to monitor load. I go with Ganglia.

Still works with a Linux CM and Windows exec machine, after some tweaking.

Martin

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Jonathan Martin
Sent: November 17, 2021 6:13 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Total load average using multiple partitionable slots

I’m struggling to interpret the load average metrics on machines that are configured with multiple partitionable slots.

As an example, here is the output of a basic condor_status command

condor_status [machine]

Slot 1:

Slot 2:

We can see that all of the dynamic slots have an average CPU load close to 1, which I would expect. However, running this in compact form produces:

The CpuLoad doesn’t seem to correctly capture the total load summed across all dynamic slots. I’m wondering if this is a bug, or if there is another way I can capture total CPU load on a machine like this.

Thanks,

Jon

This communication (both the message and any attachments or links) is confidential and only intended for the use of the person or persons to whom it is addressed unless we have expressly authorized otherwise. It also may contain information that is protected by solicitor-client privilege. If you are reading this communication and are not an addressee or authorized representative of an addressee, we hereby notify you that any distribution, copying or other use of it without our express authorization is strictly prohibited. If you have received this communication in error, please delete both the message and any attachments from your system and notify us immediately by e-mail or phone. In addition, we note that this communication and its transmission of data have not been secured by encryption. Therefore, we are not able to confirm or guarantee that the communication has not been intercepted, amended, or read by an unintended third party.

Mailing List Archives

Public Access

Re: [HTCondor-users] Total load average using multiple partitionable slots