[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] RemoteWallClockTime vs CommittedTime



Hi Dan,

No, this is always after the job has completed or been killed. The RemoteWallClockTime is always > 0.

Interesting that you mention that if the job gets evicted, then RemoteWallClockTime is updated. I am now wondering if this is an additive process or just a plain update. By that I mean, if a job gets evicted multiple times, does the RemoteWallClockTime get added to each time or just updated with the latest time.
or
If there is some means of killing the job (maybe kill -9) that would leave the values in the state I am seeing.

Maybe I am not understanding the doc correctly but it seems that RemoteWallClockTime should always be >= CommittedTime.

John Weigand

On 10/15/2013 12:57 PM, Daniel Forrest wrote:
John,

We have noticed a problem in collecting accounting data from the HTCondor
classads.  We are seeing situations where CPU is exceeding Wall time.

We use the RemoteWallClockTime classad as the basis of Wall time.  According
to the documentation, this appears to be the correct one to use.  The accounting
system also captures CommittedTime.   We are seeing conditions where
CommittedTime exceeds RemoteWallClockTime.  One of many cases....
  CommittedTime = 3944     RemoteWallClockTime = 1   Total CPU = 1935

Based on the documentation, if I am interpreting it correctly, CommittedTime
should never exceed RemoteWallClockTime since CommittedTime can get reset to
zero if evicted w/o a checkpoint.  And RemoteWallClockTime does not.

I am trying to understand under what conditions this can occur.
It is making no sense to us.

Is this happening while the jobs are actively running?  Because the
RemoteWallClockTime returned from condor_q is only accurate when the
job is not running.

I have jobs running now with multiple hours of CommittedTime, but with
RemoteWallClockTime still zero.  If evicted, the RemoteWallClockTime
is updated.