[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Results on Remove?



On 10/27/05, bgore@xxxxxxxxxx <bgore@xxxxxxxxxx> wrote:
> Erik, if you put jobs on hold, does temp output come back to the
> submitter? What about if you issue a condor_vacate on a machine running
> a job -- then the temp output would be xfered back right?

Output from a prematurely stopped (vacated / held / shadow crash* etc
etc...) job is *never* transferred back unless a checkpoint is
perceived to have happened.

To do this by default would be a massive waste of IO for a great many
people's farms.

If the temp output is useless from the point of view of getting my job
running on another machine then why bother transferring it. The
default behaviour for non checkpointed jobs should always be to get
off the machine as fast as possible and get on another as fast as
possible.

If you need that temp output in normal operation consider altering
your job to trap the WM_CLOSE/exit signal and exit cleanly as well as
enabling the transfer ON_EXIT_OR_EVICT.

By first sending a vacate message to the job then holding it once it
has returned the contents of the temp working directory will be
available in your spool directory.

If this output is considerable you will waste a lot of time and
bandwidth doing this when you don't need it though.

Also note that in cases where you are using PREEMPT and cycle stealing
you may find your users get annoyed by jobs not getting out of their
way fast enough (disk thrashing can be more annoying than CPU hogging)

Matt

* obviously a shadow crash means a checkpoint won't work any way