[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] CondorView is being used widely?

Hi Ian -

Thanks for your post.

As the original author (shudder) of the CondorView client, I am pleased to hear your find it useful. I have been playing around on a successor to CondorView - currently I am doing this as a class project, but I hope I'll have something ready to share late Dec or Jan. So far I am aiming for:
  - interactive javascript-based charts (instead of java applet)
  - easy to configure new charts without code changes
- better built-in understanding of partitionable slots (the current CondorView usage graphs are a little wonky if you use a lot of partitionable slots)
  - leverage the classad-based statistics introduced in v7.8 [1]
- lightweight, fast and easy to deploy, no additional infrastructure (like database servers) required

As for other tools to see what is happening in HTCondor, ones that come to mind include CycleServer and Cumin. I believe both of these can layer on top of an existing HTCondor pool, and go beyond just monitoring and move into administration.

Also, here at UW-Madison, we use Ganglia to monitor a lot of our IT infrastructure (beyond just HTCondor). At last year's Condor Week workshop Becky Gietzel demonstrated Ganglia plugins to show all sorts of HTCondor info, see link [1] below. We are running several of them in our Ganglia installation. She said something about posting these to Github.

Finally, re ND's Condor Log Analyzer, you may be interested in the HTCondor config knob "EVENT_LOG" (see the Manual)- this specifies a file that receives job user log events, but across all users and user's jobs on a given schedd. I added a link to Condor Log Analyzer here:
I've been trying to collect on this wiki page an index of add-ons, and would love to hear about any additional ones.


Hope the above helps,

On 11/30/2012 6:03 AM, Smith, Ian wrote:
I find CondorView absolutely indispensible for monitoring our Condor pool
and would be completely lost without it but it would be nice to have
some more sophisticated monitoring. This was raised at a (UK) Campus
Grids SIG [1] but I'm not sure if it got any further than that. One
nice feature would be to be able to get a plot of the User stats
broken down >per user<. Badput stats would also be very useful.

I also make heavy use of the Condor Log Analyzer from University
of Notre Dame [2] to monitor badput. Since we have a largish (1400 cores)
Windows based pool which can only run vanilla universe jobs this
is obviously a major concern. I automatically aggregate the stats and update
and publish them daily [3] which a good way of "encouraging" users to
include their own checkpointing to reduce badput.

I'd be very interested to hear of any other monitoring tools.



[1] http://wikis.nesc.ac.uk/escinet/Campus_Grids
[2] http://condorlog.cse.nd.edu/
[3] http://condor.liv.ac.uk/analysis/

Dr Ian C. Smith,
Advanced Research Computing,
University of Liverpool, UK.

-----Original Message-----
From: htcondor-users-bounces@xxxxxxxxxxx [mailto:htcondor-users-
bounces@xxxxxxxxxxx] On Behalf Of Seo-Young Noh
Sent: 30 November 2012 10:08
To: HTCondor-Users Mail List
Subject: [HTCondor-users] CondorView is being used widely?

Hi all,

I would like to monitor condor jobs (history etc) at web. So,
CondorView seems right to be considered. But, I could not find full
information for installation and usages. I am just wondering if
CondorView is widely used to monitor jobs at web; otherwise, there is a
popular way to monitor jobs.

Thank you.


Seo-Young Noh, Ph.D.

Supercomputing Center
Korea Institute of
Science and Technology Information
Email: rsyoung@xxxxxxxxxxx
Cell: +82 (0)10-2262-8014