[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] inner dag in 7.2.0

On Wed, 21 Jan 2009, Dimitri Maziuk wrote:

Here's another 7.2.0 problem:

We run a bunch of parallel jobs followed by one clean-up/summary job that must
run even if some of the former failed. So we have an inner dag for them and
outer dag that runs that, override its return value in post-script, and runs
the clean-up job.

The batch is submitted from an NFS share. So we take the output
of "condor_submit_dag -no_submit inner.dag" and replace the cwd condor sticks
into "log = " line with /var/tmp. We did that "in place", without renaming
the file. That worked fine until 7.2.0 upgrade.

What seems to happen now is when we run "condor_submit_dag outer.dag", it
overwrites our inner submit file and the whole thing fails because log file
is on nfs. I.e. "condor_submit_dag outer.dag" now also
does "condor_submit_dag -no_submit inner.dag".

That interpretation is correct.  However, you can restore the old
behavior by adding -no_recurse to the command line for condor_submit_dag
for the outer DAG.

If I'm understanding right, your procedure should be like this:

* condor_submit_dag -no_submit inner.dag
* modify inner.dag.condor.sub
* condor_submit_dag -no_recurse outer.dag

You might also want to check out the -update_submit flag:
  This optional argument causes an existing .condor.sub file to not be
  treated as an error; rather, the .condor.sub file will be overwritten,
  but the existing values of -maxjobs, -maxidle, -maxpre, and -maxpost
  will be preserved.

Kent Wenger
Condor Team