[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor v8.8.1 not detecting Owner State?



Between 8.6 and 8.8, the default value of the IS_OWNER configuration knob changed from

  IS_OWNER=(START =?= False)

to

  IS_OWNER = false

 

For several years now, most users of HTCondor now have dedicated pools of execute nodes, so in 8.8, the default

configuration for a Startd is to always run jobs, and to assume that there are no users running jobs on the machine

that are not under the HTCondor Startd.

 

If you want the old policy, were Execute machines are assumed to be owned by users and HTCondor is only using

spare cycles, then you should add this to your configuration

 

   Use POLICY : Desktop

 

For similar reasons, the LoadAv column of condor_status in 8.8 shows the load average attributable to HTCondor

and jobs under HTCondor rather than the load average of the machine as a whole.

 

if you run condor_status -long LoadAvg  I think you will see that the overall load average is still reported to the collector

it just doesnât appear in the default condor_status output

 

-tj

 

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Nicola Caon
Sent: Tuesday, April 23, 2019 11:21 AM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] HTCondor v8.8.1 not detecting Owner State?

 

Hi,

 

a few weeks ago I've upgraded HTCondor to v8.8.1 on all machines in our pool (mainly consisting of regular users' Linux desktops).

 

Surprisingly, after the upgrade all machines appeared with State either Claimed or Unclaimed, none with Owner. After reverting back to the previous installed version (8.6.12), and investigating the problem, it seems to me that the new HTCondor version is not returning the correct value of LoadAv (indeed it is always 0.0 except for Claimed machines).

 

In the example below, "rusia" is a Fedora 26 machine, with HTCondor v8.8.1 (installed from the tarballs). It is currently running several programs which keep the overall CPU usage above 60% or so. This is how its status appears from another Linux box, using  v8.16.12 and v8.8.1.

 

ncaon@venezia> /usr/pkg/condor/condor-8.6.12-x86_64_RedHat7-stripped/bin/condor_status rusia
Name                  OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@xxxxxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      1.000 1961  0+02:29:30
slot2@xxxxxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      1.000 1961  0+02:30:03
slot3@xxxxxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      1.000 1961  0+02:30:03
slot4@xxxxxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      7.820 1961  0+02:30:03

 

 

ncaon@venezia> /usr/pkg/condor/condor-8.8.1-x86_64_RedHat7-stripped/bin/condor_status rusia
Name                  OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@xxxxxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1961  0+02:34:30
slot2@xxxxxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1961  0+02:35:03
slot3@xxxxxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1961  0+02:35:03
slot4@xxxxxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1961  0+02:35:03

The LoadAv from v8.6.12 is indeed consistent with the value printed by the uptime command. Has anything changed in the latest (stable) releases with respect to how the LoadAv values are obtained and how the State of the machine is determined? (Version 8.8.2 shows the same behavior as v8.8.1)

 

Thanks!

 

Nicola