[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Outage timekeeping?



However, I'm wondering if there is there an established way to create persistent machine classads without involving state files.

	Have you looked at OFFLINE and ABSENT ads in the collector?

Absent ads are written to disk and persist across restarts and reboots. An ad becomes "absent", IIRC, when it would otherwise age out of the collector.

(Ads age out of the collector every few minutes; I forget which knob controls the rate. IIRC, if an ad isn't updated for three consecutive update intervals, the collector throws it away, figuring that Something Terrible has happened to either the daemon sendint it or the network in between.)

The intended use of absent ads is to make it possible to check, via the collector, which machines "should" be in your pool as opposed to which ones actually are. (Obviously, if your pool is glide-ins, this is mostly useless.) There's a knob you can use to determine which ads you keep (e.g., you only want uptime numbers for startds you control).

The absent ad will contain the ad's usual attributes, including the last update time, which will give you an approximation of how long the machine has been down at the time that you checked. This won't be quite the same number as downtime of the machine, or the downtime of the startd, but since (generally speaking) a startd that's not in the collector can't do useful work, it may be a number you care about, and close enough to what you actually want.

Absent ads also eventually age out, and are also removed when an update for the same ad arrives.

- ToddM