[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Maximum number of retries per job?



Angel de Vicente wrote:
periodic_remove = NumRestarts > 10

1. could it be possible then to automatically add this to every user's job
description?

One kludgey-but-effective way to do this now is via a condor_submit wrapper which adds a "-a periodic_remove = NumRestarts > 10" argument to submits.


But we're working on a SYSTEM_PERIODIC_REMOVE config expression which will allow the administrator to set a schedd-wide policy independent from that which the users set in their personal periodic_remove. It should be in an upcoming 6.7 series release (probably 6.7.3), but no promises.

2. I wouldn't want to kill a standard universe job that has restarted more than
10 times. Is there a way to differentiate between restarts in the vanilla
universe and restarts in the standard universe?

The universe of a job is advertised in its "Universe" attribute. Just add that the the periodic_remove expression so it only becomes true for the job universes you want.


-Peter

--
Peter Couvares                        University of Wisconsin-Madison
Condor Project Research               Department of Computer Sciences
pfc@xxxxxxxxxxx                       1210 W. Dayton St. Rm #4241
(608) 265-8936                        Madison, WI 53706-1685