[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Don't understand RETRY in DAGMan [I think I see the problem] (fwd)



Kent, thanks much for looking at the logs, discovering my error, and suggesting the fix.

Rescue DAG runs now work properly but all is not quite well yet. Now, after the failed node(s) run successfully, the condor_dagman.exe doesn't quit...just remains in a Run state. Another overlooked option on my part? I'd like it to quit, of course, after the re-submittal and all nodes have finished, successfully or failed.

On Tue, Oct 21, 2014 at 8:25 AM, R. Kent Wenger <wenger@xxxxxxxxxxx> wrote:
Oops -- I just did a reply to sender and didn't send this to the list.  I thought I should send it to the list, just to make this clear:  doing

  condor_submit_dag -f <whatever>.dag

will prevent any rescue DAGs from being used (the -f is what's significant).

Kent

---------- Forwarded message ----------
Date: Tue, 21 Oct 2014 09:46:36 -0500 (CDT)
From: R. Kent Wenger <wenger@xxxxxxxxxxxxxxxxx>
To: "Finch, Ralph@DWR" <Ralph.Finch@xxxxxxxxxxxx>
Subject: RE: [HTCondor-users] Don't understand RETRY in DAGMan [I think I see
    the problem]

On Tue, 21 Oct 2014, Finch, Ralph@DWR wrote:

 Why the difference? In the batch file I delete all prior HTC individual log
 files…I realize now the Rescue process needs those log files to properly
 continue. Well, in my defense, keeping the HTC log files is not mentioned as
 a requirement in the manual.

 I’ll modify my script, test it, and if this is true I’ll post an update to
 the email list. Sorry about the confusion.

Actually, the problem is that you're passing the -force flag to condor_submit_dag on the runs that ignore the rescue dag.  You don't need the individual log files for the rescue DAG to work.

In the condor_submit_dag man page, the description of -force mentions that it will prevent a rescue DAG from being used, but I guess that needs to be added to the rescue DAG subsection of the main DAGMan section.

Kent Wenger
CHTC Team
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/