[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Maximum number of retries per job?
- Date: Fri, 1 Oct 2004 10:45:48 -0500
- From: Erik Paulson <epaulson@xxxxxxxxxxx>
- Subject: Re: [Condor-users] Maximum number of retries per job?
On Fri, Oct 01, 2004 at 04:39:49PM +0100, Angel de Vicente wrote:
> I've been looking in the mailing list and the manual, but I cannot find a
> solution for something it should be a common problem: I just want to limit the
> number of times a particular job is restarted, for example to 10. Why?
> I find that, for instance, if I specify an input file that actually doesn't
> exist, the job is started continiously until I realize and remove it from the
> queue. Other situation is when a user submits a very long job. The job will
> start, and after perhaps 10 hours of progress, a user will use the computer and
> the job will be killed, and over and over again, and this process is using lots
> of CPU but it doesn't have many chances of ever finishing, so I'd like to remove
> it, and send an e-mail to the user saying that he/she has better chances of
> succeeding if the program is made to finish in a shorter time.
Condor sets the number of restarts for each job in the job ad.
You could set
periodic_remove = NumRestarts > 10
in your submit file, and the job will leave the queue after it's restarted
(Or, you could set periodic_hold, which leaves the job in the queue but doesn't
try and start it any more)
> Any ideas? Thanks a lot,
> Angel de Vicente
> PostDoc Software Support
> Instituto de Astrofisica de Canarias
> Condor-users mailing list