[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Automatic Restart of Failed jobs.

You bet. This is where on_exit_remove comes in handy:

on_exit_remove = ExitCode =?= 0

Says: remove this job from the queue when it ends if it exits with a value of zero. Otherwise it goes back in to the queue in the idle state.

See: http://www.cs.wisc.edu/condor/manual/v7.4/condor_submit.html#72329

- Ian

On 2010-10-05, at 10:32 PM, Edier Alberto Zapata Hernández <edalzap@xxxxxxxxx> wrote:

> Good night,
>  Today I was running some test with Exonerate using Condor. I split the queries file in many files eachone with only 1 sequence in it. The problem is that some of the jobs failed some because the node was down when I put the Database in them, other because they crash, and so on.
> I got the Error files of all the jobs, but check one by one, find the job's files and restart it's a little slow (the queries file have 13,600+ sequences). Is there a parameter in the submitFile to define that if the job fails (and only if It fails, I mean if the jobs finish Ok, no actions should be taken.) Condor should try to restart it? 
> Thank you.
> ----
> Edier Alberto Zapata Hernández
> Est. Ingeniería de Sistemas
> Universidad de Valle
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/