[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Put jobs on hold if output or error files grow large?



Hi

we have some dagman based pipelines which may or may not cause massive
trouble depending on the input data set. The first sign of trouble is
that enormous amounts of data are written to stderr and thus arrive in
the file referenced by "err" in the submit file.

Obviously, the correct choice would be to fix the programs somehow to
detect this, but given this pipeline is (a) complex, (b) parts of it are
ancient and (c) the exact location of the problem may also change on top
of all complexity, I'm currently searching for an idea how to put these
on hold after say the error log file is larger than 10 or 100MByte.

We tried "Period_hold" first, but I'm not sure there is a way to check
for file sizes there, but browsing through the manual did not reveal
anything really matching.

Has anyone ever tried this (or did I just miss an obvious way)?

Cheers

Carsten

-- 
Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics
Callinstrasse 38, 30167 Hannover, Germany
phone/fax: +49 511 762-17185 / -17193
https://wiki.atlas.aei.uni-hannover.de/foswiki/bin/view/ATLAS/WebHome