[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] zero cputime reported by startd 10.5.0



Hi,
we are evaluating upgrade of our Execution Points from 9.0.17 to Current Feature Release (10.5.0).
(while keeping 9.0.17 at the SCHEDDs _especially those with condor-ce_ and at Central Manager).

So I upgraded a WN to 10.5.0, i've seen it working as expected and let it work during the weeend.
Today i went looking in the accounting for jobs done in that machine and noticed that zero or very little cputime
is accounted.
This looks strange, as running top from the machine shows machine cores "working hard".

To verify a little further i executed a known test program who just crunch integer numbers for some 5 minutes.
It run as expected, reports the correct final result (thus the program HAS run) but still the accounting reports zero cputime

In the Job history file these are the relevant reported values:

[root@ce06-htc 2023-6]# cat history.9602771.0 | egrep -i '^RemoteWallClockTime|cpu'
CPUsUsage = 0.9997753100861237
CpusProvisioned = 1
CumulativeRemoteSysCpu = 0.0
CumulativeRemoteUserCpu = 0.0
MATCH_EXP_numcpus = "32"
MATCH_TotalSlotCpus = 32
MachineAttrCpus0 = 1
RemoteSysCpu = 0.0
RemoteUserCpu = 0.0
RemoteWallClockTime = 503.0
RequestCpus = 1


In general
*
SysCpu and *UserCpu are at 0.0, rarely few seconds are reported ( <= 5)

[root@wn-200-10-11-02-a ~]# condor_version Â

$CondorVersion: 10.5.0 2023-06-05 BuildID: 650732 PackageID: 10.5.0-1 $
$CondorPlatform: x86_64_CentOS7 $

Is that a communication problem with the 9.0.17 schedd ?

Cheers,
Stefano