[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] File last modification time or job last write() attribute?



From: MIRON LIVNY <miron@xxxxxxxxxxx>
Date: 05/25/2016 02:29 PM

> Michael,
>
> Can you tell us how you plan to use this information. In other words "why
> do you care about when the last write took place?"

>
> Miron

Sure, professor: in some scenarios the only reasonable course of action is
to keep trying until the bitter, bitter end. And so if timing out is not an
option, then one doesn't put a timeout function into the code in the first
place.

I suppose it's in the same realm as Michelle Craft's asymptotic optimization
on slide nine, with its eight-hour deadline:
http://research.cs.wisc.edu/htcondor/HTCondorWeek2016/presentations/WedCraft_NEOS.pdf

The trick is detecting the asymptote as early as possible to minimize
badput time.

And so if a log file is supposed to have data written to it for each
time slice, for example, and nothing has appeared in it for far longer than
you'd expect a single time slice ought to take, then you can conclude that
you're not going to make any further forward progress and some action should
be taken. Since the job won't terminate itself for reasons, it falls to a
periodic_hold or _remove _expression_ which can use that last-write time number
compared to CurrentTime in order to trigger, imposing an external timeout.

        -Michael Pelletier