Hi all, I am struggling to interpret jobs at us, that ended up in state 3/removed and have event dates, which seem odd to me. For example, job [1] got submitted through a CondorCE onto the cluster and got removed around 1641290372 (last current status). The job has no RemoteWallClockTime (undefined) - however the job (shadow??) has actually a number of start dates. Since the job as two shadows and job counts but only one actual job starts, I am unsure, how to interpret the job start dates here. I suppose, the initial start date points to the first shadow/job count, with a second start(?) around the CurrentStarts. But has there actually been a job instance, that run on a node? Or are the various start dates referring to shadow events (and if so, what did the shadow do)? While there was no RemoteWallClockTime logged, what happened between CurrentStart*Date and EnteredCurrentStatus? btw: what is actually the difference between JobCurrentStartDate and JobCurrentStartExecutingDate? I would read [2] in a way, that JobCurrentStartDate is the moment the sandbox transfer is initiated and that JobCurrentExecutionDate is the moment the transfer finished and the job actually starts, or? (however, this interpretation would break down here, where the *ExecutingDate is earlier than *StartDate) Maybe somebody has an idea, what the event flow of this shadow/job might have been? (package versions during history generation were [3]) Cheers and thanks for ideas, Thomas [1] ClusterID: 2131886 JobStatus: 3 QDate: 1641290372 JobStartDate: 1641270888 JobCurrentStartDate: 1641278464 JobCurrentStartExecutingDate: 1641270889 EnteredCurrentStatus: 1641290372 RemoteWallClockTime: undefined CumulativeSlotTime: 0 CommittedTime: 0 CompletionDate: 0 NumShadowStarts: 2 NumJobStarts: 1 JobRunCount: 2 [2] https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html?highlight=JobCurrentStartExecutingDate#job-classad-attributes [3] condor-9.0.8-1.el7.x86_64 condor-classads-9.0.8-1.el7.x86_64 condor-externals-9.0.8-1.el7.x86_64 condor-procd-9.0.8-1.el7.x86_64 htcondor-ce-5.1.2-1.el7.noarch htcondor-ce-apel-5.1.2-1.el7.noarch htcondor-ce-client-5.1.2-1.el7.noarch htcondor-ce-condor-5.1.2-1.el7.noarch htcondor-ce-view-5.1.2-1.el7.noarch python2-condor-9.0.8-1.el7.x86_64 python3-condor-9.0.8-1.el7.x86_64 on EL7 3.10.0-1160.36.2.el7.x86_64 [queries.a] > condor_history -file history.a 2131886 -af ClusterID RoutedFromJobId JobStatus QDate JobStartDate JobCurrentStartDate JobCurrentStartExecutingDate EnteredCurrentStatus RemoteWallClockTime CumulativeSlotTime CommittedTime CompletionDate NumShadowStarts NumJobStarts JobRunCount 2131886 1143391.0 3 1641290372 1641270888 1641278464 1641270889 1641290372 undefined 0 0 0 2 1 2 [queries.b] > condor_history -file history.a 2131886 -format "ClusterID: %d\n" ClusterID -format "RoutedFromJobId: %s\n" RoutedFromJobId -format "JobStatus: %d\n" JobStatus -format "QDate: %d\n" QDate -format "JobStartDate: %d\n" JobStartDate -format "JobCurrentStartDate: %d\n" JobCurrentStartDate -format "JobCurrentStartExecutingDate: %d\n" JobCurrentStartExecutingDate -format "EnteredCurrentStatus: %d\n" EnteredCurrentStatus -format "RemoteWallClockTime: %V\n" RemoteWallClockTime -format "CumulativeSlotTime: %d\n" CumulativeSlotTime -format "CommittedTime: %d\n" CommittedTime -format "CompletionDate: %d\n" CompletionDate -format "NumShadowStarts: %d\n" NumShadowStarts -format "NumJobStarts: %d\n" NumJobStarts -format "JobRunCount: %d\n" JobRunCount ClusterID: 2131886 RoutedFromJobId: 1143391.0 JobStatus: 3 QDate: 1641290372 JobStartDate: 1641270888 JobCurrentStartDate: 1641278464 JobCurrentStartExecutingDate: 1641270889 EnteredCurrentStatus: 1641290372 RemoteWallClockTime: undefined CumulativeSlotTime: 0 CommittedTime: 0 CompletionDate: 0 NumShadowStarts: 2 NumJobStarts: 1 JobRunCount: 2
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature