DAGMan RETRY is not very tunable. Its two features are just retry n-times and don't retry if received exit signal the one specified with the optional UNLESS-EXIT but to elaborate on your questions.
Best of luck,
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Nicolas Arnaud <nicolas.arnaud@xxxxxxxxxxxxxxx>
Sent: Friday, August 19, 2022 9:45 AM
To: HTCondor Users <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] 2 questions about job retry
I have a couple questions about how to tune the retry of a failed DAG job.
1) What's the best way to wait some seconds before attempting a retry?
I've thought of using a POST script that would have $RETURN among its
arguments and call |sleep| if $RETURN is not equal to 0, but I wonder
whether that would work and whether there is a simpler way to do
2) When a job retries, I would like it *not* to run where the failed job
has run. Searching on the web lead me to adding the line
> requirements = Machine =!= LastRemoteHost
to the submit file that is called by the JOB command on the DAG file,
but that doesn't seem to work. More often than not, the job reruns in
the same place (same machine and same slot) than the failed try.
The Condor version I am using is
> $CondorVersion: 9.0.11 Mar 12 2022 BuildID: 578027 PackageID: 9.0.11-1 $
> $CondorPlatform: x86_64_CentOS7 $
Thanks in advance,
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at: