[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Put jobs on hold if output or error files grow large?

Hi Carsten,

Would MAX_TRANSFER_OUTPUT_MB be what you are looking for?

That places the job on hold if the final output files (all of them in aggregate) are above a certain size.

(See http://research.cs.wisc.edu/htcondor/manual/v8.1/3_3Configuration.html)


On Jul 23, 2014, at 2:03 PM, Carsten Aulbert <Carsten.Aulbert@xxxxxxxxxx> wrote:

> Hi
> we have some dagman based pipelines which may or may not cause massive
> trouble depending on the input data set. The first sign of trouble is
> that enormous amounts of data are written to stderr and thus arrive in
> the file referenced by "err" in the submit file.
> Obviously, the correct choice would be to fix the programs somehow to
> detect this, but given this pipeline is (a) complex, (b) parts of it are
> ancient and (c) the exact location of the problem may also change on top
> of all complexity, I'm currently searching for an idea how to put these
> on hold after say the error log file is larger than 10 or 100MByte.
> We tried "Period_hold" first, but I'm not sure there is a way to check
> for file sizes there, but browsing through the manual did not reveal
> anything really matching.
> Has anyone ever tried this (or did I just miss an obvious way)?
> Cheers
> Carsten
> -- 
> Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics
> Callinstrasse 38, 30167 Hannover, Germany
> phone/fax: +49 511 762-17185 / -17193
> https://wiki.atlas.aei.uni-hannover.de/foswiki/bin/view/ATLAS/WebHome
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/