[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Setting up HTCondorView



Thank you very much Jason,

Since my last mail I have been trying to set up Ganglia with no success (because of my ignorance about setting it up).

I have first tried using a Docker image with an old Ganglia version and, after that, installing it directly on Debian 11. In both cases I was able to connect to all hosts by using the default configuration, but no HTCondor information is ever shown. I have looked into the condor_gangliad logs, but everything seems to work fine as far as I can tell (which is not much). I would like to know if the info is getting to Ganglia or not in order to identify where the problem lies (on the HTCondor configuration end or in the Ganglia configuration end), but I'm kinda lost.

As for Fifemon, I'll give it a look. It loos promising and I'm more comfortable working with Python, so maybe I'll have better luck.

I'll keep you posted if I end up figuring it out, in case anyone is interested in setting this up.

Javier Barbero


    
El 28/06/2022 a las 17:11, Jason Patton escribiÃ:

Javier,

The HTCondorView server hasn't been given much attention lately, so if it were to break in the future, fixing it would probably be low priority. A couple of options for monitoring your pool:

1. Ganglia is a bit old, but it's still supported: https://htcondor.readthedocs.io/en/latest/admin-manual/monitoring.html#ganglia

2. Fifemon was developed at Fermilab to push data to Graphite (which can then be displayed in Grafana): https://fifemon.github.io/

3. If you want to get really low level, you can use the Python bindings to poll the HTCondor daemons and push metrics to whatever system you like. Here's a link to the pre-rendered tutorial that shows how to query the Collector and Schedd: https://htcondor.readthedocs.io/en/latest/apis/python-bindings/tutorials/HTCondor-Introduction.html

Jason

On 6/25/22 12:40 PM, Javier Barbero GÃmez wrote:

Yes, I should have looked into it in more in detail... I was looking for a utility where admins and users can check the load on the cluster, maybe split up by resource (CPU, memory, GPU). Even better if it could show the job queue instead of having to use condor_q, all of this from the browser.

Even though the client is outdated, the server (which I assume is what is used by the condor_stats command) is still reporting some errors, like I said. Is it outdated too?

El 24/06/2022 a las 22:37, Jason Patton escribiÃ:

Hi Javier,

The HTCondor View client is a very old contributed module that we have not updated to keep up with the latest HTCondor releases. For example, the last time the "view_client-2.1-Any-Java.tar.Z" file linked from the wiki was modified was in 2011 (actual changes to the code may be even before that). It's not part of the HTCondor code base, and it is probably due to be removed from our documentation.

Can you let us know what kind of monitoring you had in mind and maybe we can help you find a more recent and better supported solution?

Jason Patton

On 6/23/22 2:30 PM, Javier Barbero wrote:

Hi everyone, I'm trying to set up the HTCondorView server in our cluster by following the instructions from the documentation (https://htcondor.readthedocs.io/en/latest/admin-manual/setting-up-special-environments.html#configuring-a-machine-to-be-a-htcondorview-server, by the way, even when knowing that it is there, this section is very hard to find and separate from the Client information).

I decided to use my existing collector as the HTCondorView collector by adding the following configuration:Â

POOL_HISTORY_DIR = $(LOCAL_DIR)/log/condorview
POOL_HISTORY_MAX_STORAGE = 322122547
KEEP_POOL_HISTORY = True

(also, is the POOL_HISTORY_MAX_STORAGE in bytes? It is not very well specified in https://htcondor.readthedocs.io/en/latest/admin-manual/configuration-macros.html#POOL_HISTORY_MAX_STORAGE)

POOL_HISTORY_DIR resolves to "/var/log/condorview" and the "condor" user has been given ownership of this directory. After a few mistakes with the configuration, the View server appears to be running, but the response I get with condor_stats is very weird. For example, if I run "condor_stats -userlist" I get (I have anonimized the user, host and domain, but it is always the same every time I try):

failed to receive data from the CondorView server
user@xxxxxxxxxx/main.domain.com

I have tried to set up the Client following its corresponding instructions (https://htcondor.readthedocs.io/en/latest/contrib-source-modules/view-client-contrib-module.html) and while running the setup command I get several messages saying "failed to receive data from the CondorView server". If I run the "./make_stats hour" command I also get the same message once.

I cannot see any error messages in the collector logs and the condorview logs look fine, although I don't really know what they should look like. Am I missing something?


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/