Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Monitoring condor nodes with hobbit

Date: Wed, 8 Jan 2014 07:54:17 -0800
From: Lans Carstensen <lans.carstensen@xxxxxxxxxxxxxx>
Subject: Re: [HTCondor-users] Monitoring condor nodes with hobbit

Aside from setting up up/down monitoring to ensure that your
collectors are healthy, submissions are working, and startd nodes
haven't fallen out of a pool - the real monitoring value that's been
added in the last few years is in operational statistics included in
the negotiator and schedd daemon classads.  It's worth your while to
collect and graph some of those stats.  I covered a couple of graphs
we use a couple years ago at HTCondorWeek.

http://research.cs.wisc.edu/htcondor/CondorWeek2012/presentations/carstensen-dreamworks.pdf

-- Lans Carstensen

On Wed, Jan 8, 2014 at 6:40 AM, Cody Belcher <codytrey@xxxxxxxxxxxxxxxx> wrote:
> I suppose my end goal is to easily see when a node has an issue, but you are
> right, I do get emails when say sched crashes or something. with out any
> extra configuration I can use hobbit to see which hosts are on, and that
> will work for my needs.
>
> Thanks,
>
> Cody
>
>
> On 01/08/2014 08:13 AM, Ben Cotton wrote:
>>
>> Cody,
>>
>> When I was at Purdue, I tried monitoring HTCondor servers (i.e. not
>> execute nodes) with Nagios. I eventually removed the checks because
>> they didn't add value. The condor_master does a good job of making
>> sure the daemons are running. I did get alerts for the schedd checks,
>> but they turned out to be false alarms when the schedd was just too
>> busy to answer the condor_q from Nagios. (I suppose that's an issue in
>> itself, but it wasn't what we were checking for).
>>
>> I guess the point of this story is to ask what exactly you want to
>> check and why. Knowing that makes it easier to offer guidance.
>>
>>
>> Thanks,
>> BC
>>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

Follow-Ups:
- Re: [HTCondor-users] Monitoring condor nodes with hobbit
  - From: Brian Bockelman

References:
- [HTCondor-users] Monitoring condor nodes with hobbit
  - From: Cody Belcher
- Re: [HTCondor-users] Monitoring condor nodes with hobbit
  - From: Ben Cotton
- Re: [HTCondor-users] Monitoring condor nodes with hobbit
  - From: Cody Belcher

Prev by Date: Re: [HTCondor-users] HTCondor ClassAd attributes list in python bindings
Next by Date: [HTCondor-users] condor head node connection to other node
Previous by thread: Re: [HTCondor-users] Monitoring condor nodes with hobbit
Next by thread: Re: [HTCondor-users] Monitoring condor nodes with hobbit
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Monitoring condor nodes with hobbit