[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] DAGMan queue query



On Fri, 26 May 2006, Erik Paulson wrote:

> On Fri, May 26, 2006 at 11:46:36AM +0100, o c wrote:
> > The documentation for DAGMan (Condor docs section
> > 2.11.2) says:
> >
> > "each Condor submit description file must submit only
> > one job. There may not be multiple queue commands, or
> > DAGMan will fail."
> >
> > There may not be multiple Queue commands but may there
> > be  a single Queue statement with an argument greater
> > than 1?
> >
> > Is this valid for a node in a DAG?
> >
> > Universe = vanilla
> > Executable = work
> > Log = testdag.log
> > Queue 100
> >
> > Should this behave normally under DAGMan?
> >
>
> No, that's prohibited.
>
> With multiple queue statements per file, there is no way for
> DAGman to resubmit failed jobs without resubmitting all jobs.

Just a note here -- from the section of the manual you reference, I assume
you're running Condor 6.6.x (DAGMan is section 2.12 in the 6.7.x manual).

Anyhow, as of 6.7.18, the above submit file *is* legal for a DAG node job.
DAGMan still can't deal with a single condor_submit generating multiple
clusters, though, so that does put some limitations on what you can
do.  But the above example is fine.

Note that, as Erik alludes to above, if the node has to get retried,
all 100 job procs will get re-run, even if 99 of them succeeded the first
time.  Basically, if any of the job procs fail, the whole node is
considered failed.

Kent Wenger
Condor Team