[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] How to forbid job restarts
- Date: Wed, 13 Jan 2021 15:32:38 -0600 (CST)
- From: Todd L Miller <tlmiller@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] How to forbid job restarts
After an extensive web-search, I do not seem to find an answer to a
simple question: how do I forbit HTCondor to restart my jobs?
How can I tell HTCondor that it is forbidden to restart jobs and all the
jobs should be allowed to finish no matter how long it takes?
Generally speaking, whether or not a job is _interrupted_ is up to
the administrator of the startd on which the job is run (and the vagaries
of random failures). Because of the distributed nature of HTCondor, it's
not possible to ensure that job is only ever started once (a startd could
fall off the network after it receives the job but before it starts it,
for example), but see the following for a discussion of the best that you
can do as a job owner to prevent your job from _restarting_.
What could be the reason the jobs started to restart execution
periodically when run as part of a DAG?
That is indeed a very good question.
I am the administrator of my HTCondor cluster, so I am sure that nether
HTCondor configuration parameters were changed, nor the individual job
submit files were changed.
What happens if run a DAG like the one which is failing, but whose
jobs just sleep for 75 minutes?