[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Ganglia Heartbeats sent 0



So, on your 1 node pool, you get 8 daemon ads but none are published to ganglia. So, either the ads or machine names do not match. You increased the GANGLIAD_VERBOSITY to 10. So, unless /home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/etc/condor/ganglia.d/00_default_metrics is empty, you are not matching machine names.

First, make sure that the 00_default_metrics is not empty.

Then, I recommend setting
    GANGLIA_SEND_DATA_FOR_ALL_HOSTS = true

This setting is used to inject metrics for hosts not being monitored by ganglia (typically windows hosts or other hosts without a local gmond). You may see the HTCondor metrics appear under a different hostname. Whenever the HTCondor gangliad propagates metrics to hosts not monitored by ganglia, it needs to send the heartbeats for those hosts. Sending 0 heartbeats is not in of itself indicative of a problem.

Hopefully, that will get you a little further along. Let us know what you find.

...Tim

On 03/09/2015 12:59 PM, Ricardo Oda wrote:
Hello,

I trying to learn how to setup ganglia to monitor a condor pool.

I'm currently working on localhost to make things easier. I configured ganglia and it's working to monitor this 1 node cluster. The default metrics of gmond.conf are working fine and appear on the web frontend, but I'm having trouble to get the condor metrics.

In the GangliaLog I have:

03/09/15 14:20:28 ******************************************************
03/09/15 14:20:28 ** condor_gangliad (CONDOR_GANGLIAD) STARTING UP
03/09/15 14:20:28 ** /home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/libexec/condor_gangliad
03/09/15 14:20:28 ** SubsystemInfo: name=GANGLIAD type=DAEMON(12) class=DAEMON(1)
03/09/15 14:20:28 ** Configuration: subsystem:GANGLIAD local:<NONE> class:DAEMON
03/09/15 14:20:28 ** $CondorVersion: 8.3.4 Mar 02 2015 BuildID: 304666 $
03/09/15 14:20:28 ** $CondorPlatform: x86_64_Ubuntu14 $
03/09/15 14:20:28 ** PID = 8922
03/09/15 14:20:28 ** Log last touched 3/9 14:20:11
03/09/15 14:20:28 ******************************************************
03/09/15 14:20:28 Using config source: /home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/etc/condor_config
03/09/15 14:20:28 Using local config sources:
03/09/15 14:20:28    /home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/local.xxxx/condor_config.local
03/09/15 14:20:28 config Macros = 58, Sorted = 58, StringBytes = 1697, TablesBytes = 2136
03/09/15 14:20:28 CLASSAD_CACHING is ENABLED
03/09/15 14:20:28 Daemon Log is logging: D_ALWAYS D_ERROR
03/09/15 14:20:28 Daemoncore: Listening at <0.0.0.0:45401> on TCP (ReliSock) and UDP (SafeSock).
03/09/15 14:20:28 DaemonCore: command socket at <xxx.xxx.xx.xx:45401>
03/09/15 14:20:28 DaemonCore: private command socket at <xxx.xxx.xx.xx:45401>
03/09/15 14:20:28 Testing /usr/bin/gmetric
03/09/15 14:20:28 Loading libganglia libganglia.so
03/09/15 14:20:28 Will use libganglia to interact with ganglia.
03/09/15 14:20:28 Will perform stats publication every GANGLIAD_INTERVAL=60 seconds.
03/09/15 14:20:28 Reading metric definitions from /home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/etc/condor/ganglia.d/00_default_metrics
03/09/15 14:20:48 Starting update...
03/09/15 14:20:48 Ganglia is monitoring 1 hosts
03/09/15 14:20:48 Got 8 daemon ads
03/09/15 14:20:48 Heartbeats sent: 0
03/09/15 14:21:08 Starting update...
03/09/15 14:21:08 Heartbeats sent: 0


Here are my configs of ganglia:

$condor_config_val -dump |grep -i ganglia
DAEMON_LIST = COLLECTOR MASTER NEGOTIATOR SCHEDD STARTD GANGLIAD
GANGLIA_CONFIG = /etc/ganglia/gmond.conf
GANGLIA_GMETRIC = /usr/bin/gmetric
GANGLIA_GSTAT_COMMAND = gstat --all --mpifile --gmond_ip=localhost --gmond_port=8649
GANGLIA_LIB = libganglia.so
GANGLIA_LIB64_PATH = /lib64,/usr/lib64,/usr/local/lib64
GANGLIA_LIB_PATH = /lib,/usr/lib,/usr/local/lib
GANGLIA_SEND_DATA_FOR_ALL_HOSTS = false
GANGLIAD = $(LIBEXEC)/condor_gangliad
GANGLIAD_INTERVAL = 60
GANGLIAD_LOG = $(LOG)/GangliadLog
GANGLIAD_METRICS_CONFIG_DIR = /home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/etc/condor/ganglia.d
GANGLIAD_PER_EXECUTE_NODE_METRICS = true
GANGLIAD_REQUIREMENTS = 
GANGLIAD_VERBOSITY = 10
MAX_GANGLIAD_LOG = $(MAX_DEFAULT_LOG)

GANGLIAD daemon is running but I think it's not transmitting its monitoring data to ganglia.
Do I have to do something to include the condor default metrics into ganglia?

Well I'm not sure why but I keep getting "Heartbeats sent: 0". I would appreciate some help.

Thanks in advance,
Ricardo Oda


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

-- 
Tim Theisen
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736