[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Passing condor_dagman args with condor_submit_dag?

On Tue, 28 Feb 2006, Armen Babikyan wrote:

> I am attempting to use Condor for a large distributed batch processing
> project.  I'm using condor_dagman as a meta scheduler by limiting the
> number of jobs that occur at the same time.  I've organized each job to
> be an iteration of a loop, and I have 2 layers of recursion.  Let me
> throw some numbers out there: my outer loop iterates 100 times, and my
> inner loop iterates 1000 times (each of these loops contains a DAG).  I
> am implementing looping by unrolling the logical loop into a dynamically
> generated DAG file.
> While my solution might prevent condor's scheduler from getting
> overloaded with jobs, I am faced with another problem: organizing the
> files on disk so that one directory doesn't contain something on the
> order of 100*1000 = 100,000's of submit files (and a multiple for output
> and log files).  I'm starting with the obvious: make a directory and
> subdirectory for each iteration of the inner loop.  However, I am
> running across a problem.

You might be able to reduce the number of submit files you need by
judicious use of the VARS command; although that won't do anything
about input and output files, etc.

> condor_dag_submit -no_submit accepts my *.dag file and produces a
> *.dag.condor.sub file, but I am having trouble properly referencing this
> *.dag.condor.sub file from a *.dag file in the parent directory.  I
> think this is because condor_dag_submit does not let me configure some
> of condor_dagman's arguments in the submit file it generates.  For example:
> outer *.dag file:
> JOB MAINDAG_111 111/maindag_111.dag.condor.sub
> JOB MAINDAG_222 222/maindag_222.dag.condor.sub
> 111/maindag_111.dag.condor.sub:
> # Filename: maindag_111.dag.condor.sub
> # Generated by condor_submit_dag maindag_111.dag
> universe        = scheduler
> executable      = /opt/condor/bin/condor_dagman
> getenv          = True
> output          = maindag_111.dag.lib.out
> error           = maindag_111.dag.lib.out
> log             = maindag_111.dag.dagman.log
> remove_kill_sig = SIGUSR1
> on_exit_remove  = (ExitBySignal == false || ExitSignal =!= 9)
> arguments       = -f -l . -Debug 3 -Lockfile maindag_111.dag.lock
> -Condorlog /tmp/exp6/111/process_a_111.log -Dag maindag_111.dag -Rescue
> maindag_111.dag.rescue -MaxIdle 5 -MaxJobs 1 -UseDagDir
> environment     =
> _CONDOR_DAGMAN_LOG=maindag_111.dag.dagman.out;_CONDOR_MAX_DAGMAN_LOG=0
> queue
> When the outer condor_dagman reads and tries to execute the inner loop's
> condor_dagman, it fails, because it looks in the outer directory for
> maindag_111.dag rather than in the directory 111 (where the above submit
> file, and anything related to maindag_111*, is).

Yes, that makes sense given your example.

> Is there a way I can tell condor_dag_submit to pass particular arguments
> (e.g. -Dag, -Rescue, output files) to the submit file it generates?
> It would be cool if there was a way to get condor_dagman to chdir() into
> a directory before executing.  I looked at -UseDagDir, but this will put
> output/log files in the parent directory - something I am trying to avoid.

The way you have things right now, -UseDagDir has no effect because
there is no path in the DAG file specification in your submit file.
Basically, -UseDagDir does cause DAGMan to chdir() to the DAG file's
directory before doing a condor_submit on a node job, or whatever.

Okay, I *think* this will solve your problem:  when you run
condor_submit_dag for your inner DAGs, run it in the top-level
directory, like this:

    condor_submit_dag -no_submit -UseDagDir 111/maindag_111.dag

> I guess I could write my own condor_submit_dag too, but I'd rather not
> go to that extreme. :-)
> Any insight would be great.  Thanks!

Please try things as I suggest above, and let us know if it works.

Kent Wenger
Condor Team