[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Testing systems services


On 10/05/11 16:51, Burnett, Ben wrote:

So I've been trying to manage a small Condor pool (~200 cores) over the last little while, and I've run into a small irritating issue, and wondered if others have experience the same thing, or if they have solutions/ideas.

So I have configured the pool to do various helpful things, like accept GPU jobs, provide dynamic slots on some of the more capable machines, etc.  What I have found though, is that once I've set the configuration, I rarely revisit it.  This means that if it stops working, I won't know until someone complains.  This might contribute to a decreased workload, since if no one complains, then it does not need to be fixed; however, it is more generally the case that I do get complaints, and generally they arrive in my inbox near strict deadlines (not that anyone ever leaves things to the last minute :P).

Does anyone have a relatively simple system to continuously test their pool's services?  Ideally, I'd like the test jobs to run with very low priority, so as not to interfere with regular workloads, but  would like them to run at least once a day (or as often as practically possible), and keep track of the results (this could just be an email, or a log file).  Then, if one job fails, I'd like to be emailed about it.

I can think of a few approaches myself, but I thought I'd ask if anyone has already got something similar up and running.

sorry to reply so late. Did you have a look at Hawkeye? http://www.cs.wisc.edu/condor/hawkeye/

It's been a while since I last used it, but it was very easy to use and out of the box you could check a lot of useful things like disk space, logged on users etc.

Ángel de vicente

High Performance Computing Support PostDoc
Instituto de Astrofísica de Canarias
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en