[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] File last modification time or job last write() attribute?



From: Jose Caballero <jcaballero.hep@xxxxxxxxx>
Date: 05/26/2016 02:24 PM
 
> Is it not possible in your case to have the actual job to do it?


In the real world, timing out is not an option for some tasks, so
there's no timeouts in the code in that situation. You can kill it
off in the lab, of course, but it has to be done from outside the
job.

> Something like forking a separate process that watches over that file,
> and sends a signal to the main process when it does not see
> progress...
> That does not requires any extra HTCondor feature, right? Would
> something like that work?

You can't fork a daemon in a +PreCmd since all those processes get
killed when the job starts, but you could do it in a user_job_wrapper.

That might be preferable to a hook in some ways, but having an extra
process hanging around doing nearly nothing rubs me the wrong way.
I like the way the update_job_info hook spawns automatically and has
minimal requirements and overhead. A wrapper-spawned daemon, though,
would eliminate potential issues if the STARTER_UPDATE_INTERVAL was
set to an excessive value, since the daemon would have control over
its own interval and you could check a job attribute to allow the user
to control the interval.

I like how my hook is setting a job attribute, rather than trying to
signal the process itself, since that allows the submission to
set the policy on what to do in a given scenario, rather than the
hook author.

        -Michael Pelletier.