Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Maximum number of retries per job?
- Date: Fri, 1 Oct 2004 12:01:19 -0500
- From: "Peter F. Couvares" <pfc@xxxxxxxxxxx>
- Subject: Re: [Condor-users] Maximum number of retries per job?
Angel de Vicente wrote:
periodic_remove = NumRestarts > 10
1. could it be possible then to automatically add this to every user's
job
description?
One kludgey-but-effective way to do this now is via a condor_submit
wrapper which adds a "-a periodic_remove = NumRestarts > 10" argument
to submits.
But we're working on a SYSTEM_PERIODIC_REMOVE config expression which
will allow the administrator to set a schedd-wide policy independent
from that which the users set in their personal periodic_remove. It
should be in an upcoming 6.7 series release (probably 6.7.3), but no
promises.
2. I wouldn't want to kill a standard universe job that has restarted
more than
10 times. Is there a way to differentiate between restarts in the
vanilla
universe and restarts in the standard universe?
The universe of a job is advertised in its "Universe" attribute. Just
add that the the periodic_remove expression so it only becomes true for
the job universes you want.
-Peter
--
Peter Couvares University of Wisconsin-Madison
Condor Project Research Department of Computer Sciences
pfc@xxxxxxxxxxx 1210 W. Dayton St. Rm #4241
(608) 265-8936 Madison, WI 53706-1685