Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] CPU accounting: NonCondorLoadAvg

Date: Mon, 03 Jun 2013 17:17:55 +0100
From: Brian Candler <b.candler@xxxxxxxxx>
Subject: Re: [HTCondor-users] CPU accounting: NonCondorLoadAvg

And here's another observation. On some slots - but not all - LoadAvg isexactly 1 larger than CondorLoadAvg.


Showing columns in the following order:
* TotalLoadAvg
* TotalCondorLoadAvg
* LoadAvg
* CondorLoadAvg

I see the following:

$ condor_status -format %17.17s Name -format " %-9.9s" State -format "%-8.8s" Activity -format " %4d" Cpus -format " %6.3f" TotalLoadAvg-format " %6.3f" TotalCondorLoadAvg -format " %6.3f" LoadAvg -format "%6.3f\n" CondorLoadAvg | grep dar3

slot1@xxxxxxxxxxx Owner     Idle       18 13.700  9.770  1.000 0.000
slot1_11@xxxxxxxx Claimed   Busy        1 13.700  9.770  0.650 0.650
slot1_12@xxxxxxxx Claimed   Busy        1 13.700  9.770  0.650 0.650
slot1_13@xxxxxxxx Claimed   Busy        1 13.700  9.770  0.670 0.670
slot1_14@xxxxxxxx Claimed   Busy        1 13.700  9.770  1.630 0.700
slot1_15@xxxxxxxx Claimed   Busy        1 13.700  9.770  1.720 0.720
slot1_16@xxxxxxxx Claimed   Busy        1 13.700  9.770  1.770 0.770
slot1_1@xxxxxxxxx Claimed   Busy        1 13.700  9.770  0.710 0.710
slot1_2@xxxxxxxxx Claimed   Busy        1 13.700  9.770  0.710 0.710
slot1_3@xxxxxxxxx Claimed   Busy        1 13.700  9.770  0.760 0.760
slot1_4@xxxxxxxxx Claimed   Busy        1 13.700  9.770  0.710 0.710
slot1_5@xxxxxxxxx Claimed   Busy        1 13.700  9.770  0.690 0.690
slot1_6@xxxxxxxxx Claimed   Busy        1 13.700  9.770  0.710 0.710
slot1_7@xxxxxxxxx Claimed   Busy        1 13.700  9.770  0.660 0.660
slot1_8@xxxxxxxxx Claimed   Busy        1 13.700  9.770  0.670 0.670
$ ssh dar3 uptime
 17:04:52 up 19 days, 22:48,  0 users,  load average: 13.79, 13.82, 14.05

slot1@xxxxxxxxxxx Owner     Idle       27  5.300  1.000  1.000 0.000
slot1_1@xxxxxxxxx Claimed   Busy        1  5.300  1.000  0.200 0.200
slot1_2@xxxxxxxxx Claimed   Busy        1  5.300  1.000  0.500 0.200
slot1_4@xxxxxxxxx Claimed   Busy        1  5.300  1.000  1.200 0.200
slot1_7@xxxxxxxxx Claimed   Busy        1  5.300  1.000  1.200 0.200
slot1_8@xxxxxxxxx Claimed   Busy        1  5.300  1.000  1.200 0.200
$ ssh dar4 uptime
 17:04:38 up 19 days, 22:40,  0 users,  load average: 5.33, 5.22, 5.22

It looks like TotalLoadAvg is the sum of LoadAvg (5.300 =1.000+0.200+0.500+1.200+1.200+1.200). Note that this includes the first1.000 which is the LoadAvg of 1.000 in an idle slot!

Also, TotalCondorLoadAvg is the sum of CondorLoadAvg (1.000 =0.200+0.200+0.200+0.200+0.200)

I found some code in src/condor_startd.V6/ResMgr.cpp which appears tospread the "owner load" over the slots, 1.0 per slot, which I thinkexplains that. And "owner load" is m_attr->load() - m_attr->condor_load()

Unfortunately, with I/O-waiting applications, the sum of CPU utilisationof the processes is not directly comparable to the /proc/loadavg values,i.e. the difference isn't going to give the "owner" load as far as I cansee.


Regards,

Brian.

References:
- [HTCondor-users] CPU accounting: NonCondorLoadAvg
  - From: Brian Candler

Prev by Date: [HTCondor-users] CPU accounting: NonCondorLoadAvg
Next by Date: Re: [HTCondor-users] issues getting with condor
Previous by thread: [HTCondor-users] CPU accounting: NonCondorLoadAvg
Next by thread: Re: [HTCondor-users] issues getting with condor
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] CPU accounting: NonCondorLoadAvg