[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] What are you using to monitor clients in your pool?

I'm doing a very non-scientific survey of the available software
packages to monitor the clients in our various pools here at Altera. We
currenlty use Nagios (http://www.nagios.org/) but it's a bit of a beast
and not the most Windows-friendly piece of software out there.

I was wondering what other people out there are using to monitor and
track their execute machines. We're looking to track things like disk
failures, disk full, CPU temperature, CPU load, network load, network
connectivity (pings for uptime), reboots. And across a mix of Linux,
Solaris and Windows machines. And being able to aggregate data from
multiple pools around the world in one spot would be nice.

Some packages to investigate on my short list are:

Big Brother (http://www.bb4.com/)
Big Sister (http://bigsister.graeff.com/)
Zabbix (http://www.zabbix.com/)

Anyone have an insight into using these products on a Condor pool? I'm
also toying with just doing this all with Condor Hawkeye scripts. But
this approach, as far as I can tell, suffers from one fatal flaw: Condor
tolerates machines going up and down without sending out notifications.
I need to know when machines die.

- Ian

Confidentiality Notice.  This message may contain information that is confidential or otherwise protected from disclosure.
If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution, 
or copying of this message, or any attachments, is strictly prohibited.  If you have received this message in error, 
please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.