[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] File last modification time or job last write() attribute?



2016-05-26 14:06 GMT-04:00 Michael V Pelletier
<Michael.V.Pelletier@xxxxxxxxxxxx>:
> From: MIRON LIVNY <miron@xxxxxxxxxxx>
> Date: 05/26/2016 01:46 PM
>
>> You do not have an algorithm to decide when a job stopped making progress
>> based on its Output behavior after it consumed one hour of CPU time.
>>
>> What am I missing?
>
> Ah, I see what you're getting at now.
>
> Regardless of how much time the job has spent in slot, we can decide
> that it is hung and needs to be terminated if it has gone at least one
> hour (for example) without making any updates to a particular file.
>
>         -Michael Pelletier.
> _


Hi,

I usually don't follow very closely threads in this forum, but this
one actually caught my attention, for a number of reasons.

Is it not possible in your case to have the actual job to do it?
Something like forking a separate process that watches over that file,
and sends a signal to the main process when it does not see
progress...
That does not requires any extra HTCondor feature, right? Would
something like that work?

Cheers,
Jose