Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Setting up a minimum delay before retrying a job that has failed?

Date: Fri, 12 Apr 2019 14:12:32 +0200 (CEST)
From: "Beyer, Christoph" <christoph.beyer@xxxxxxx>
Subject: Re: [HTCondor-users] Setting up a minimum delay before retrying a job that has failed?

Hi Nicolas,

I think you could do that inside the node start expression, something like: 

START = (NumJobStarts < 1) || ((CurrentTime - EnteredCurrentStatus) > (<time in secs>)

Would probably do the trick, need to test it of course as I am to lazy to do that right now, and you have to combine it with your other START dependencys most likely ;) 

Maybe it's not the most intelligent place to do that either but it's one way to get around your problem for sure .... 

Best
Christoph

-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

----- UrsprÃngliche Mail -----
Von: "Nicolas Arnaud" <narnaud@xxxxxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Freitag, 12. April 2019 13:15:02
Betreff: [HTCondor-users] Setting up a minimum delay before retrying a job	that has failed?

Hello,

Is there an easy way to set a (minimum) delay between two (re)tries of 
an HTCondor job? That would help in case the failure is due to a 
transient problem (like the unavailability of the input data) that is 
likely to be solved after O(few minutes) at most. Currently the retries 
are so quick that the maximum number of retries is reached before the 
transient problem gets cleared.

Thanks in advance,

Nicolas
-- 

==========================================
= Nicolas ARNAUD                         =
= Laboratoire de l'Accelerateur Lineaire =
= CNRS/IN2P3 & UniversitÃ Paris-Sud      =
= Virgo Experiment                       =
=                                        =
= European Gravitational Observatory     =
= Via E. Amaldi, 5                       =
= 56021 Santo Stefano a Macerata         =
= Cascina (PI) -- Italia                 =
= Tel: + 39 050 752 314                  =
==========================================
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

References:
- [HTCondor-users] Setting up a minimum delay before retrying a job that has failed?
  - From: Nicolas Arnaud

Prev by Date: [HTCondor-users] Setting up a minimum delay before retrying a job that has failed?
Next by Date: [HTCondor-users] TMPDIR variable in singularity
Previous by thread: [HTCondor-users] Setting up a minimum delay before retrying a job that has failed?
Next by thread: [HTCondor-users] TMPDIR variable in singularity
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Setting up a minimum delay before retrying a job that has failed?