I was curious if anyone has suggestions on how to monitor the health of a Condor pool? I am trying to track down an error (Q3) and was also trying to develop a set of commands for monitoring Condor.

Doug Thain, at Notre Dame, has a Condor Log Analyzer tool which may be useful to you.  I'm working offline now, but if you google for it, I'm sure you'll find it.  This  mostly deals with standard Condor job log files or possibly also DAG log files, rather than service log files, but it may help.

If it is any consolation, we also find it hard to figure out what is going on with Condor.  Staring at service log files (with "tail -f") seems to be about the best we can do.  We agree this is suboptimal.


