[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Monitoring condor nodes with hobbit
- Date: Wed, 8 Jan 2014 09:13:17 -0500
- From: Ben Cotton <ben.cotton@xxxxxxxxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Monitoring condor nodes with hobbit
When I was at Purdue, I tried monitoring HTCondor servers (i.e. not
execute nodes) with Nagios. I eventually removed the checks because
they didn't add value. The condor_master does a good job of making
sure the daemons are running. I did get alerts for the schedd checks,
but they turned out to be false alarms when the schedd was just too
busy to answer the condor_q from Nagios. (I suppose that's an issue in
itself, but it wasn't what we were checking for).
I guess the point of this story is to ask what exactly you want to
check and why. Knowing that makes it easier to offer guidance.
Leader in Utility HPC Software