[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job continually being run due to shadow exception errors.



On Feb 14, 2006, at 10:06 PM, <Greg.Hitchen@xxxxxxxx> <Greg.Hitchen@xxxxxxxx> wrote:

Below are some log files from a job submission that appears to run OK
and
produce the correct program output BUT condor considers it to have
failed
and keeps it in the queue and keeps resubmitting it and re- executing it.

The job is a monte carlo simulation that can be limited to run for
X amount of time. I have set it to run for 10mins CPU time.

The strange thing is that the condor job log file is there, even though
the
log files below indicate that the file transfer fails, and therefore
causes
the starter to exit, which in turn causes the shadow exception error,
which is why condor keeps trying to run it all the time.

The Condor user log (what you call the Condor Job Log) is written on the submit side, so its contents are unaffected by the file transfer problems (other than noting the failures).

For files that are transferred from the execute machine, Condor creates empty copies when the job is submitted to verify that it can write to them later (when the job completes).

As Matt noted, it looks like you specified D7EG9AB.condorlog to be transferred, but your job isn't creating the file.

+--------------------------------+-----------------------------------+
|           Jaime Frey           | I used to be a heavy gambler.     |
|       jfrey@xxxxxxxxxxx        | But now I just make mental bets.  |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind.        |
+--------------------------------+-----------------------------------+