[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Torture test



Hi,
I am writing a small frontend for bioninformatics tasks, which will be used by users which are rather unaware of the cluster behind it.
Since the cluster (128 CPU) should work without continuous supervision, I made some torture tests with many very small jobs. The results are zombie jobs which ahve been finished successfully, but are still noted as running on their nodes, slowly blocking the whole cluster.
Questions:
- Can it be avoided ?
- If not: Is there a better way to get the system back in sync than to remove all jobs with the forcex option?


Cheers,

Ralf