[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Condor Submit file: big whole file or several submit files
- Date: Fri, 26 Jul 2013 10:22:03 -0500 (CDT)
- From: "R. Kent Wenger" <wenger@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Condor Submit file: big whole file or several submit files
On Fri, 26 Jul 2013, Antonio Chay wrote:
First of all, glad to hear that you're using DAGMan!
- For DAGMan: How can I have one big submit file and get all the
"rescue" benefits? (i.e.: re-running failed jobs only).
Unfortunately, there is no way to do this. DAGMan can only re-run jobs at
the granularity of a submit file. (Basically, DAGMan is just running a
bunch of condor_submit commands, so it can't really do anything that you
can't do by running condor_submit on the command line.)
I don't see functionality like this coming any time soon, either -- just
thinking about how to do it, even manually, things get pretty difficult.
So I guess I'd say that this is a reason to *not* have DAGMan nodes
consist of a large number of procs in a single cluster.
If there's some kind of hierarchical relationship between jobs that you
want to preserve in your DAGs, you might consider using splices or
sub-DAGs -- that would allow your top-level DAG to still be quite simple,
but you'd get the full rescue capability of DAGMan.