[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] condor_history - CumulativeSlotTime < CommittedSlotTime ???



Hi All

Management have been wanting some more usage type stats to make pretty graphs, etc.

So I've made some mods to my scripts that normally produce monthly overall job numbers and CPU hrs of usage.
This makes use of the condor_history command.

The script can now produce daily stats per user over a given time period. Not enough apparently.
They would also like other info, e.g. number of times a job restarted, total (cumulative) run-time, run-time when it
ran to completion, etc.

OK, I added more outputs from condor_history, e.g. JobRunCount, CommittedSlotTime,  CumulativeSlotTime, etc.
When looking at the overall output, there were instances where the overall CumulativeSlotTime was less than the
overall CommittedSlotTime for a user jobs for some days.

Looking in more detail at individual jobs showed some where this was the case. I've tried from the verbose listing
to make some sense of it but can't. e.g. the following is some output for a single job. Some jobs have a CumulativeSlotTime
(and RemoteWallClockTime) of zero but with numbers for CommittedSlotTime?

Any insights appreciated.

Cheers

Greg

CommittedSlotTime = 2045.0
CommittedSuspensionTime = 299
CommittedTime = 2045
CompletionDate = 1496157937
CumulativeSlotTime = 1048.0
CumulativeSuspensionTime = 299
EnteredCurrentStatus = 1496157937
JobCurrentStartDate = 1496156889
JobCurrentStartExecutingDate = 1496156926
JobCurrentStartTransferOutputDate = 1496157895
JobFinishedHookDone = 1496157940
JobLastStartDate = 1496156860
JobStartDate = 1496156860
JobStatus = 4
LastJobStatus = 2
LastMatchTime = 1496156889
LastSuspensionTime = 0
LocalSysCpu = 0.0
LocalUserCpu = 0.0
NumJobMatches = 2
NumRestarts = 0
NumShadowExceptions = 1
NumShadowStarts = 2
NumSystemHolds = 0
QDate = 1496149612
RemoteSysCpu = 148.0
RemoteUserCpu = 507.0
RemoteWallClockTime = 1048.0
TotalSuspensions = 1


JobFinishedHookDone - JobCurrentStartDate = 1051
JobFinishedHookDone - JobCurrentStartExecutingDate = 1014
JobFinishedHookDone - JobStartDate = 1080

EnteredCurrentStatus - JobCurrentStartDate = 1048
EnteredCurrentStatus - JobCurrentStartExecutingDate = 1011
EnteredCurrentStatus - JobStartDate = 1077

CumulativeSlotTime - CommittedSlotTime = -977

I can't see where/how the CommittedSlotTime is calculated/determined.


Output from my modified condor_history output:

ID                        OWNER   SUBMITTED             RUN_TIME ST    COMPLETED           CMD            JobRunCount CommittedSlotTime CumulativeSlotTime
599668.45939 ota006 2017/05/30 23:06:52 0+00:17:28 C 2017/05/31 01:25:37 E:\ota006\           2                           2045.0                        1048.0