[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Maximum number of retries per job?



Hi,

at last I'm goint to implement this, but according to the documentation

NumRestarts
    : A count of the number of restarts from a checkpoint attempted by this job during its lifetime.


so, I guess if I want to just do it for vanilla jobs I should better use
JobRunCount, which does not seem to be documented in the manual, but I assume it
means how many times the job has started. I was going to try:

condor_submit -a 'periodic_remove = JobRunCount > 10 && JobUniverse == 5' $*

Any issues with this?

Thanks a lot,
Angel de Vicente



Peter F. Couvares writes:
 > Angel de Vicente wrote:
 > >> periodic_remove = NumRestarts > 10
 > >
 > > 1. could it be possible then to automatically add this to every user's 
 > > job
 > >    description?
 > 
 > One kludgey-but-effective way to do this now is via a condor_submit 
 > wrapper which adds a "-a periodic_remove = NumRestarts > 10" argument 
 > to submits.
 > 
 > But we're working on a SYSTEM_PERIODIC_REMOVE config expression which 
 > will allow the administrator to set a schedd-wide policy independent 
 > from that which the users set in their personal periodic_remove.  It 
 > should be in an upcoming 6.7 series release (probably 6.7.3), but no 
 > promises.
 > 
 > > 2. I wouldn't want to kill a standard universe job that has restarted 
 > > more than
 > >    10 times. Is there a way to differentiate between restarts in the 
 > > vanilla
 > >    universe and restarts in the standard universe?
 > 
 > The universe of a job is advertised in its "Universe" attribute.  Just 
 > add that the the periodic_remove expression so it only becomes true for 
 > the job universes you want.
 > 
 > -Peter
 > 
 > -- 
 > Peter Couvares                        University of Wisconsin-Madison
 > Condor Project Research               Department of Computer Sciences
 > pfc@xxxxxxxxxxx                       1210 W. Dayton St. Rm #4241
 > (608) 265-8936                        Madison, WI 53706-1685
 > 
 > _______________________________________________
 > Condor-users mailing list
 > Condor-users@xxxxxxxxxxx
 > http://lists.cs.wisc.edu/mailman/listinfo/condor-users

-- 
----------------------------------
http://www.iac.es/galeria/angelv/

PostDoc Software Support
Instituto de Astrofisica de Canarias