[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Automatic Restart of Failed jobs.

On 10/05/2010 10:32 PM, Edier Alberto Zapata Hernández wrote:
Good night,
  Today I was running some test with Exonerate using Condor. I split the
queries file in many files eachone with only 1 sequence in it. The
problem is that some of the jobs failed some because the node was down
when I put the Database in them, other because they crash, and so on.

I got the Error files of all the jobs, but check one by one, find the
job's files and restart it's a little slow (the queries file have
13,600+ sequences). Is there a parameter in the submitFile to define
that if the job fails (and only if It fails, I mean if the jobs finish
Ok, no actions should be taken.) Condor should try to restart it?

Thank you.

Edier Alberto Zapata Hernández
Est. Ingeniería de Sistemas
Universidad de Valle

Hopefully you can identify failed runs by the process's exit code, in which case you should consider on_exit_remove,


Or maybe on_exit_hold, if you want a chance to fix up files or the database server before the job is retried,