[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] History file with incorrect JobCurrentStartDate? - version windows 8.4.4



We've done some digging in the code, and the only thing we can think that would explain this is if the clock jumped ahead by 2 days
just at the time the job started, then it jumped back again for the remainder of the job execution. 

JobCurrentStartDate is set by the schedd at the time it creates the shadow,
the rest of the numbers are set or calculated by the shadow based on the current system clock time.

So in order for this to happen,  the schedd would need to get 1521637146 from the system clock when it creates the shadow
and the shadow would need to get 1521445871 from the system clock when the job actually begins running just a few seconds
later.  Otherwise, we can't explain how the CommittedSlotTime is negative - it is calculated by the shadow which means that the shadow
would have to SEE the JobCurrentStartDate that is in the future relative to it's own clock.

If this were a case of all of the values but JobCurrentStartDate being stale somehow, then the CommittedSlotTime would not be negative.

-tj

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of John M Knoeller
Sent: Thursday, April 5, 2018 10:12 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] History file with incorrect JobCurrentStartDate? - version windows 8.4.4

This is not a known bug. but it seems similar to this bug
https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=6626

which will be fixed in the upcoming 8.6.11 release.

could you send me the entire job ad so I can have a look?

thanks
-tj


-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Greg.Hitchen@xxxxxxxx
Sent: Thursday, April 5, 2018 2:00 AM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] History file with incorrect JobCurrentStartDate? - version windows 8.4.4

Hi All

We have a number of windows submit nodes in our HTCondor pool.

We have reporting scripts on a linux machine that download all the history files and can
provide monthly (or daily) usage on a per user basis.

In our latest report for March 2018 we noticed something strange, -ve run times!
After tracking back through the scripts we eventually found the answer in the history files themselves.

e.g. for one job these are the relevant values in the history file

JobCurrentStartDate			1521637146
CompletionDate			1521450290
EnteredCurrentStatus			1521450290
JobCurrentStartTransferOutputDate	1521450290
JobFinishedHookDone			1521450290
LastJobLeaseRenewal			1521450290
JobCurrentStartExecutingDate		1521445871
LastMatchTime				1521445870
LastVacateTime			1521444418
JobLastStartDate			1521442616
LastRejMatchTime			1521436370
JobStartDate				1521192534
QDate					1521181266
CumulativeSlotTime			45124
RemoteWallClockTime			45124
RemoteUserCpu			2970
RemoteSysCpu				1286
CommittedSuspensionTime		0
CumulativeSuspensionTime		0
CommittedSlotTime			-186856
CommittedTime			-186856

Note the 2 -ve committed time values. These are equal to (CompletionDate - JobCurrentStartDate).

In fact the JobCurrentStartDate is nearly 2 days AFTER the CompletionDate!?

For the moment we will need to change our scripts to NOT use the CommittedSlotTime BUT
calculate it from (CompletionDate - JobLastStartDate).

Is this a known bug? Or something that someone has come across before?
I've had a look through the 8.2.* and 8.4.* release notes and bug fixes but couldn't see anything.

Thanks for any help.

Cheers

Greg

P.S. we will soon(ish) be upgrading our submit nodes from win2008 to win2016 and will upgrade
to the latest 8.6.* version then. Meanwhile the history files already exist so we will kludge
a workaround.



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/