[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Monitoring condor nodes with hobbit

I suppose my end goal is to easily see when a node has an issue, but you are right, I do get emails when say sched crashes or something. with out any extra configuration I can use hobbit to see which hosts are on, and that will work for my needs.



On 01/08/2014 08:13 AM, Ben Cotton wrote:

When I was at Purdue, I tried monitoring HTCondor servers (i.e. not
execute nodes) with Nagios. I eventually removed the checks because
they didn't add value. The condor_master does a good job of making
sure the daemons are running. I did get alerts for the schedd checks,
but they turned out to be false alarms when the schedd was just too
busy to answer the condor_q from Nagios. (I suppose that's an issue in
itself, but it wasn't what we were checking for).

I guess the point of this story is to ask what exactly you want to
check and why. Knowing that makes it easier to offer guidance.