[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job is getting rerun instead of terminated

Thank you for the explanation. I didn't realize that.

Jobs running for more than 12 hours are to be thrown out of the queue. The 
users will not add a job ad like that, because they just forget. How could 
I do this with config files? Is there something like "defaults for 
submitting jobs", that I could change?

On Fri, 22 Jul 2005, Jaime Frey wrote:

> On Jul 22, 2005, at 5:26 AM, Andreas Vetter wrote:
> > we have a setup that is meant to termminate all jobs after 12 hours
> > runtime. Most jobs are vanilla universe. But sometimes there are jobs
> > that
> > are evicted after 12 hours and then started again on other nodes. The
> > user
> > finally killed the job with condor_rm. Other jobs are terminated after 12
> > hours as expected.
> > 
> > Attached is part 3 of our global condor config and the users log for the
> > restarting job.
> > 
> > Did I miss something?
> When an execute machine kills a job for running too long, the schedd doesn't
> consider the job complete. It thinks that the execute machine wasn't willing
> to let the job run long enough and it now needs to find another machine that
> will let the job run to completion. When a job leaves the queue is controlled
> by the job ad in the schedd.
> If you want your jobs to leave the queue when they run longer than 12 hours,
> you need to set periodic_remove in the job ads. If you want the jobs to stay
> in the queue but not get rerun, you need to modify the startd's requirements
> to not run jobs that previously ran for more than 12 hours.
> +----------------------------------+---------------------------------+
> |    Jaime Frey            |  Public Split on Whether        |
> |      jfrey@xxxxxxxxxxx         |  Bush Is a Divider              |
> | http://www.cs.wisc.edu/~jfrey/  |         -- CNN Scrolling Banner |
> +----------------------------------+---------------------------------+

 Andreas Vetter