One of my clusters is currently running on full load with partitionable slots.
OS: CentOS 7.6
HTCondor: 8.8.0 Jan 03 2019 BuildID: 457757 PackageID: 8.8.0-1
NUM_SLOTS = 1
NUM_SLOTS_TYPE_1 = 1
SLOT_TYPE_1 = auto
SLOT_TYPE_1_PARTITIONABLE = True
One (of 32 total) machine:
I personally don’t use condor to monitor load. I go with Ganglia.
Still works with a Linux CM and Windows exec machine, after some tweaking.
I’m struggling to interpret the load average metrics on machines that are configured with multiple partitionable slots.
As an example, here is the output of a basic condor_status command
We can see that all of the dynamic slots have an average CPU load close to 1, which I would expect. However, running this in compact form produces:
The CpuLoad doesn’t seem to correctly capture the total load summed across all dynamic slots. I’m wondering if this is a bug, or if there is another way I can capture total CPU load on a machine like this.
This communication (both the message and any attachments or links) is confidential and only intended for the use of the person or persons to whom it is addressed unless we have expressly authorized otherwise. It also may contain information that is protected by solicitor-client privilege. If you are reading this communication and are not an addressee or authorized representative of an addressee, we hereby notify you that any distribution, copying or other use of it without our express authorization is strictly prohibited. If you have received this communication in error, please delete both the message and any attachments from your system and notify us immediately by e-mail or phone. In addition, we note that this communication and its transmission of data have not been secured by encryption. Therefore, we are not able to confirm or guarantee that the communication has not been intercepted, amended, or read by an unintended third party.