[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Outage timekeeping?



Has anyone cooked up a good way to keep statistics on exec node outages? I’m looking for something comparable to the SLURM stat from sreport.

 

I’ve got a couple of ideas, but I’m not really sure how they’d work or if they’d be efficient and reliable. One idea is a startd cron or schedd cron job to report the current time into a state file, and then update a “downtime” value when a gap larger than the query interval appears there.

 

However, I’m wondering if there is there an established way to create persistent machine classads without involving state files.

 

Thanks for any ideas you might have.

 

Michael V Pelletier

Principal Engineer


C: +1 339.293.9149
michael.v.pelletier@xxxxxxx


Raytheon Technologies

Information Technology

50 Apple Hill Drive

Tewksbury, MA 01876-1198

 

RTX.com | LinkedIn | Twitter | Instagram