[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Monitoring condor nodes with hobbit
- Date: Wed, 08 Jan 2014 08:40:24 -0600
- From: Cody Belcher <codytrey@xxxxxxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Monitoring condor nodes with hobbit
I suppose my end goal is to easily see when a node has an issue, but you
are right, I do get emails when say sched crashes or something. with out
any extra configuration I can use hobbit to see which hosts are on, and
that will work for my needs.
On 01/08/2014 08:13 AM, Ben Cotton wrote:
When I was at Purdue, I tried monitoring HTCondor servers (i.e. not
execute nodes) with Nagios. I eventually removed the checks because
they didn't add value. The condor_master does a good job of making
sure the daemons are running. I did get alerts for the schedd checks,
but they turned out to be false alarms when the schedd was just too
busy to answer the condor_q from Nagios. (I suppose that's an issue in
itself, but it wasn't what we were checking for).
I guess the point of this story is to ask what exactly you want to
check and why. Knowing that makes it easier to offer guidance.