[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] RemoteWallClockTime doesn't reset when failed job reruns



Hi David,

it seems intentional for RemoteWallClockTime not to reset. I think you are looking for CommittedTime or might want to work with JobCurrentStartDate. The list of job classad attributes [0] is a good reference.

> ## RemoteWallClockTime
>
> Cumulative number of seconds the job has been allocated a machine. This also includes time spent in suspension (if any), so the total real time spent running is
>
>     RemoteWallClockTime - CumulativeSuspensionTime
>
> Note that this number does not get reset to zero when a job is forced to migrate from one machine to another. CommittedTime, on the other hand, is just like RemoteWallClockTime except it does get reset to 0 whenever the job is evicted without a checkpoint.

Cheers,
Max

[0]
https://htcondor.readthedocs.io/en/stable/classad-attributes/job-classad-attributes.html

> On 26. Jan 2021, at 10:08, David Cohen <cdavid@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> 
> Hi,
> A user of ours was using a false exit code to make jobs rerun from a checkpoint noticed that RemoteWallClockTime doesn't reset when the job rerun.
> Is that an intended behavior of the walltime counter?
> 
> Best regards,
> David
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME cryptographic signature