[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Maximum number of retries per job?



On Fri, Oct 01, 2004 at 04:39:49PM +0100, Angel de Vicente wrote:
> Hi,
> 
> I've been looking in the mailing list and the manual, but I cannot find a
> solution for something it should be a common problem: I just want to limit the
> number of times a particular job is restarted, for example to 10. Why?
> 
> I find that, for instance, if I specify an input file that actually doesn't
> exist, the job is started continiously until I realize and remove it from the
> queue. Other situation is when a user submits a very long job. The job will
> start, and after perhaps 10 hours of progress, a user will use the computer and
> the job will be killed, and over and over again, and this process is using lots
> of CPU but it doesn't have many chances of ever finishing, so I'd like to remove
> it, and send an e-mail to the user saying that he/she has better chances of
> succeeding if the program is made to finish in a shorter time.
> 

Condor sets the number of restarts for each job in the job ad. 

You could set

periodic_remove = NumRestarts > 10

in your submit file, and the job will leave the queue after it's restarted
10 times.

(Or, you could set periodic_hold, which leaves the job in the queue but doesn't
try and start it any more)

-Erik

> Any ideas? Thanks a lot,
> Angel de Vicente
> -- 
> ----------------------------------
> http://www.iac.es/galeria/angelv/
> 
> PostDoc Software Support
> Instituto de Astrofisica de Canarias
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> http://lists.cs.wisc.edu/mailman/listinfo/condor-users