Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job is getting rerun instead of terminated

Date: Fri, 22 Jul 2005 11:12:31 -0500
From: Jaime Frey <jfrey@xxxxxxxxxxx>
Subject: Re: [Condor-users] Job is getting rerun instead of terminated

On Jul 22, 2005, at 5:26 AM, Andreas Vetter wrote:

we have a setup that is meant to termminate all jobs after 12 hours
runtime. Most jobs are vanilla universe. But sometimes there are jobs that
are evicted after 12 hours and then started again on other nodes. The user
finally killed the job with condor_rm. Other jobs are terminated after 12
hours as expected.

Attached is part 3 of our global condor config and the users log for the
restarting job.

Did I miss something?

When an execute machine kills a job for running too long, the schedd doesn't consider the job complete. It thinks that the execute machine wasn't willing to let the job run long enough and it now needs to find another machine that will let the job run to completion. When a job leaves the queue is controlled by the job ad in the schedd.

If you want your jobs to leave the queue when they run longer than 12 hours, you need to set periodic_remove in the job ads. If you want the jobs to stay in the queue but not get rerun, you need to modify the startd's requirements to not run jobs that previously ran for more than 12 hours.

+----------------------------------+---------------------------------+

| Jaime Frey | Public Split on Whether |

| jfrey@xxxxxxxxxxx | Bush Is a Divider |

| http://www.cs.wisc.edu/~jfrey/ | -- CNN Scrolling Banner |

+----------------------------------+---------------------------------+

Follow-Ups:
- Re: [Condor-users] Job is getting rerun instead of terminated
  - From: Andreas Vetter

References:
- [Condor-users] Job is getting rerun instead of terminated
  - From: Andreas Vetter

Prev by Date: Re: [Condor-users] Parallel universe with MPMD progs?
Next by Date: [Condor-users] SOAP querying for Schedds
Previous by thread: [Condor-users] Job is getting rerun instead of terminated
Next by thread: Re: [Condor-users] Job is getting rerun instead of terminated
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Job is getting rerun instead of terminated