[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Output file getting lost

On Sat, 27 Mar 2004, Michal Sankot wrote:

> Hi,
> I'm running Condor on 20 computers which transform given file and I
> encountered problems, that sometimes the output isn't returned. It does
> a lot of transformations and is run overnight and when one of the files
> isn't returned, it gets stuck for whole night, which is really paintful,
> as we are short of time.
> I had to kill the job after some time, so there is command 404, but
> notice, that time changed *back* (from 15:02 to 14:57). Could that cause
> such not-functioning ?
> On all my clients I have installed Automachron, which synchronizes time
> every 30 seconds (with ntp.maths.tcd.ie). So, this system time
> fluctulations shouldn't happen.
> Would anybody know what cause of that problem (and how to fix it) ?

The system clock going backwards is definitely the cause of your problem.
If you don't set "transfer_output_files" in your submit file, when your
job completes, Condor will transfer back all the files that have been
modified or created in the job's execute directory since the job began
running. If the system clock is reset backwards, then the job's output
files can look like they were last modified before the job began running,
in which case Condor won't transfer them.

The best solution is get use a clock synchronization daemon that doesn't
cause the system time to go backwards. Many work by making the clock
advance more quickly or slowly until it matches the desired time (so that
it's always increasing). A quick fix for the output file problem is to set
transfer_output_files in your condor submit file. Then condor will
transfer those files back no matter what timestamp they have.

|             Jaime Frey             |There are 10 types of people in|
|         jfrey@xxxxxxxxxxx          |the world: Those who understand|
|   http://www.cs.wisc.edu/~jfrey/   |  binary, and those who don't  |
Condor Support Information:
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>