[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor and Ganglia



Hello, thank you for your response.
I have already solved the problem, In fact the problem was related to GANGLIAD_VERBOSITY set to a very low number and the metrics weren't being sent to the manager.

Another thing I should mention is that I was not seeing some metrics because they were defined only for a specific type of htcondor machine. For example, I had all my metrics as TargetType="Scheduler" and I was trying to get results from a working node without the SCHED daemon... This was in fact a dumb mistake from my part, but it made me lost a whole day...

Thank you for your answer!Â


On 14 July 2017 at 23:11, Tim Theisen <tim@xxxxxxxxxxx> wrote:

My apologies for the delayed response. I thought that I had already replied to this message.

You should look at the log file (/var/log/condor/GangliadLog) to see how many metrics are being sent.

The default installation only monitors the Central Manager and Submit nodes. If you have a large pool, getting many metrics from all the execute nodes can overload the ganglia system.

You can get additional metrics by setting GANGLIAD_VERBOSITY to 1 in your configuration file. The default is 0, which is pretty bare bones. The value 2 contains many metrics. You can see which metrics go out a the different verbository levels in /etc/condor/ganglia.d/00_default_metrics.

The GANGLIAD_PER_EXECUTE_NODE_METRICS boolean can be used to suppress execute_node metric (while still keeping the aggregate counts).

Hopefully, this make sense.

...Tim


On 06/13/2017 06:36 AM, Fernando Nellmeldin wrote:
Hello all.

We have a cluster with all nodes with CentOS 7. I would like to monitor its usage using Ganglia.
I configured the Ganglia server in the HTCondor collector. This is working, I can access the web service and see some information.

I also installed Ganglia in a few nodes and I can monitor their status (cpu usage, memory, disk, etc), but the metrics of Condor doesn't show up in the web interface.

In fact, what I don't know is which configuration is needed within Condor to send its metrics to Ganglia. The only thing I did was, in the Condor collector, enable the GANGLIAD daemon. The service is running, but I can't see anything related to Condor in the web server. I do know that there exists a file /etc/condor/ganglia.d/00_default_metrics but how do we use this file to define things to see in Ganglia? What I am missing?

Thank you!

Fernando


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

-- 
Tim Theisen
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736