[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] file transfer problems with vanilla job



On Fri, 12 Nov 2004 12:59:43 +0000  "Dr Ian C. Smith" wrote:

> - perhaps I'm missing something fundamental here ???

...

> The configuration file is setup so that the job
> gets a soft kill after 1 minute and hard kill after a
> further 10 minutes.
> Since the soft kill isn't trapped the job runs as expected for 11 minutes
> before going to the idle state.

ahh, that's the fundamental thing you're missing.  hardkill always
means we do not transfer any output back.  i mentioned that in passing
in another message, but i didn't realize anyone was relying on
hardkill to evict their jobs.

the intention of softkill is: "start getting off this machine now, the
owner wants it back soon."

the intention of hardkill is "ok buddy, time's up.  you took too long.
now i'm gonna kill you and remove every trace you left in the least
amount of time so the human will get its machine back right away".

if you wait until the softkill window has closed before you want
condor to *start* the process of transfering output data (which can
potentially be huge and take a while), you're just going to upset a
lot of machine owners.

so, word to the wise: if you use ON_EXIT_OR_EVICT, catch the softkill
signal, do any cleanup you need, and exit as soon as you can. ;) that
way, condor will still have a chance to transfer output before the
window of time to get off the machine is slammed shut.

-derek