[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Unable to unset monitors in claim destructor. The StartOfJob* attributes will be stale.



08/18/19 15:26:23 slot2_1: Unable to unset monitors in claim destructor. The StartOfJob* attributes will be stale. (0x1d215c0, (nil))

If GPU monitoring is turned on, GPU usage information is recorded in the slot ad, and assigned to a job as it runs on that slot. When a job starts, we record the slot's current usage in the slot ad; then we compute the job's usage by substracting this from the ongoing accumulation of usage, until the job ends. Of course, it the claim is deleted, we need to make sure that the information we recorded about the start of the job is deleted, too; otherwise, the slot will report usage for a job that's no longer running. (This won't screw up accounting, because that only counts assigned resources, not actual usage.)

However, in some cases, a claim will be deleted whose ClassAd has already been deleted. In those cases, we can't (presently) determine which monitors to unset, and so we do nothing. This _should_ only happen when the slot is being destroyed, in which case it's harmless, but I've
been unable to prove that's the case.

However, in the course of refreshing my memory about this, Jaime found a place in the code where a one-line change might substantially reduce the occurrence of these warnings; we'll see how that goes.

- ToddM