[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Condor monitoring alternatives
- Date: Fri, 8 Feb 2008 14:54:55 -0500
- From: Brent Strong <brs3567@xxxxxxx>
- Subject: [Condor-users] Condor monitoring alternatives
Here at RIT, we've been working on building up a respectable Condor
pool with some success. We're now running into the issue of
monitoring our clients. We have enough client machines that it is now
impossible to visually parse the condor_status output to find
"stragglers", so I'm looking for an automated solution.
I'm specifically looking for a lightweight alternative to hawkeye,
possibly something we could integrate into or have as an addition to
our quick stats look ( http://stats.rc.rit.edu/condor/ ).
Has anyone written a simple script or similar that contains a master
list of machines that should be up and compares the output of
condor_status to it? It seems to be something that would be very
useful and I'm hoping I can reuse someone else's code.
Ideally, we want a lightweight webpage that shows a list of machines
(by hostname, IP, whatever) that Condor is installed on and their
corresponding status (up and running condor, up but not running/
responding to condor, down). Combining the output of a ping test,
condor_status and a master list of machines, these states should be
easily determined. My question is: has anyone done this?
Research Computing at RIT