[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Ganglia Heartbeats sent 0



Hello Tim,

Indeed it was a problem of matching machine names.

In my gmond and gmetad config files I was using localhost and in my condor config, FQDN. Setting the option:

 GANGLIA_SEND_DATA_FOR_ALL_HOSTS = true

Âmade it clear as both localhost and my FQDN appeared in the node list of the web frontend. After setting the gmetad and gmond configs to use my FQDN the problem was solved.

I still get 0 hearts beats, but as you said it is not a problem, instead I get some metrics sent:
...
03/11/15 11:02:37 Ganglia metrics sent: 112
03/11/15 11:02:37 Heartbeats sent: 0
...

Thanks,
Ricardo Oda

On Mon, Mar 9, 2015 at 9:16 PM, Tim Theisen <tim@xxxxxxxxxxx> wrote:
So, on your 1 node pool, you get 8 daemon ads but none are published to ganglia. So, either the ads or machine names do not match. You increased the GANGLIAD_VERBOSITY to 10. So, unless /home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/etc/condor/ganglia.d/00_default_metrics is empty, you are not matching machine names.

First, make sure that the 00_default_metrics is not empty.

Then, I recommend setting
ÂÂÂ GANGLIA_SEND_DATA_FOR_ALL_HOSTS = true

This setting is used to inject metrics for hosts not being monitored by ganglia (typically windows hosts or other hosts without a local gmond). You may see the HTCondor metrics appear under a different hostname. Whenever the HTCondor gangliad propagates metrics to hosts not monitored by ganglia, it needs to send the heartbeats for those hosts. Sending 0 heartbeats is not in of itself indicative of a problem.

Hopefully, that will get you a little further along. Let us know what you find.

...Tim


On 03/09/2015 12:59 PM, Ricardo Oda wrote:
Hello,

I trying to learn how to setup ganglia to monitor a condor pool.

I'm currently working on localhost to make things easier. I configured ganglia and it's working to monitor this 1 node cluster. The default metrics of gmond.conf are working fine and appear on the web frontend, but I'm having trouble to get the condor metrics.

In the GangliaLog I have:

03/09/15 14:20:28 ******************************************************
03/09/15 14:20:28 ** condor_gangliad (CONDOR_GANGLIAD) STARTING UP
03/09/15 14:20:28 ** /home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/libexec/condor_gangliad
03/09/15 14:20:28 ** SubsystemInfo: name=GANGLIAD type=DAEMON(12) class=DAEMON(1)
03/09/15 14:20:28 ** Configuration: subsystem:GANGLIAD local:<NONE> class:DAEMON
03/09/15 14:20:28 ** $CondorVersion: 8.3.4 Mar 02 2015 BuildID: 304666 $
03/09/15 14:20:28 ** $CondorPlatform: x86_64_Ubuntu14 $
03/09/15 14:20:28 ** PID = 8922
03/09/15 14:20:28 ** Log last touched 3/9 14:20:11
03/09/15 14:20:28 ******************************************************
03/09/15 14:20:28 Using config source: /home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/etc/condor_config
03/09/15 14:20:28 Using local config sources:
03/09/15 14:20:28 Â Â/home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/local.xxxx/condor_config.local
03/09/15 14:20:28 config Macros = 58, Sorted = 58, StringBytes = 1697, TablesBytes = 2136
03/09/15 14:20:28 CLASSAD_CACHING is ENABLED
03/09/15 14:20:28 Daemon Log is logging: D_ALWAYS D_ERROR
03/09/15 14:20:28 Daemoncore: Listening at <0.0.0.0:45401> on TCP (ReliSock) and UDP (SafeSock).
03/09/15 14:20:28 DaemonCore: command socket at <xxx.xxx.xx.xx:45401>
03/09/15 14:20:28 DaemonCore: private command socket at <xxx.xxx.xx.xx:45401>
03/09/15 14:20:28 Testing /usr/bin/gmetric
03/09/15 14:20:28 Loading libganglia libganglia.so
03/09/15 14:20:28 Will use libganglia to interact with ganglia.
03/09/15 14:20:28 Will perform stats publication every GANGLIAD_INTERVAL=60 seconds.
03/09/15 14:20:28 Reading metric definitions from /home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/etc/condor/ganglia.d/00_default_metrics
03/09/15 14:20:48 Starting update...
03/09/15 14:20:48 Ganglia is monitoring 1 hosts
03/09/15 14:20:48 Got 8 daemon ads
03/09/15 14:20:48 Heartbeats sent: 0
03/09/15 14:21:08 Starting update...
03/09/15 14:21:08 Heartbeats sent: 0


Here are my configs of ganglia:

$condor_config_val -dump |grep -i ganglia
DAEMON_LIST = COLLECTOR MASTER NEGOTIATOR SCHEDD STARTD GANGLIAD
GANGLIA_CONFIG = /etc/ganglia/gmond.conf
GANGLIA_GMETRIC = /usr/bin/gmetric
GANGLIA_GSTAT_COMMAND = gstat --all --mpifile --gmond_ip=localhost --gmond_port=8649
GANGLIA_LIB = libganglia.so
GANGLIA_LIB64_PATH = /lib64,/usr/lib64,/usr/local/lib64
GANGLIA_LIB_PATH = /lib,/usr/lib,/usr/local/lib
GANGLIA_SEND_DATA_FOR_ALL_HOSTS = false
GANGLIAD = $(LIBEXEC)/condor_gangliad
GANGLIAD_INTERVAL = 60
GANGLIAD_LOG = $(LOG)/GangliadLog
GANGLIAD_METRICS_CONFIG_DIR = /home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/etc/condor/ganglia.d
GANGLIAD_PER_EXECUTE_NODE_METRICS = true
GANGLIAD_REQUIREMENTS =Â
GANGLIAD_VERBOSITY = 10
MAX_GANGLIAD_LOG = $(MAX_DEFAULT_LOG)

GANGLIAD daemon is running but I think it's not transmitting its monitoring data to ganglia.
Do I have to do something to include the condor default metrics into ganglia?

Well I'm not sure why but I keep getting "Heartbeats sent: 0". I would appreciate some help.

Thanks in advance,
Ricardo Oda


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

-- 
Tim Theisen
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/