[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor and monitoring performance

Michael, Ian -

The Condor Log Analyzer works on user-generated log files,
and is good for observing the overall performance envelope of a large workload.
It's available to anyone who would like to use it:


A while back, we also did some work on analyzing user log files combined
with classads to extract out information of interest to administrators,
such as jobs from user X tend to fail on machine with property Y:

David Cieslak, Nitesh Chawla, and Douglas Thain,
Troubleshooting Thousands of Jobs on Production Grids Using Data
Mining Techniques,
IEEE Grid Computing, pages 217-224, August, 2008. DOI: 10.1109/GRID.2008.4662802

I'm not aware of any general purpose tool that works on the condor
daemon log files,
which are rather unstructured.

Cycle Computing has some pretty darn cool tools that process the user
and machine ClassAds
in a Condor pool and can slice and dice into many different kinds of
reports.  Certainly worth checking out.

Best, Doug

On Mon, Jul 5, 2010 at 6:02 PM, Ian Stokes-Rees
<ijstokes@xxxxxxxxxxxxxxxxxxx> wrote:
> On 7/1/10 9:07 AM, Michael O'Donnell wrote:
> I was curious if anyone has suggestions on how to monitor the health of a
> Condor pool? I am trying to track down an error (Q3) and was also trying to
> develop a set of commands for monitoring Condor.
> Doug Thain, at Notre Dame, has a Condor Log Analyzer tool which may be
> useful to you.  I'm working offline now, but if you google for it, I'm sure
> you'll find it.  This  mostly deals with standard Condor job log files or
> possibly also DAG log files, rather than service log files, but it may help.
> If it is any consolation, we also find it hard to figure out what is going
> on with Condor.  Staring at service log files (with "tail -f") seems to be
> about the best we can do.  We agree this is suboptimal.
> Ian
> --
> Ian Stokes-Rees, PhD                       W: http://abitibi.sbgrid.org
> ijstokes@xxxxxxxxxxxxxxxxxxx               T: +1.617.432.5608 x75
> NEBioGrid, Harvard Medical School          C: +1.617.331.5993
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/