[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Passing condor_dagman args with condor_submit_dag?
- Date: Wed, 1 Mar 2006 10:10:16 -0600 (CST)
- From: "R. Kent Wenger" <wenger@xxxxxxxxxxx>
- Subject: Re: [Condor-users] Passing condor_dagman args with condor_submit_dag?
On Tue, 28 Feb 2006, Armen Babikyan wrote:
> I am attempting to use Condor for a large distributed batch processing
> project. I'm using condor_dagman as a meta scheduler by limiting the
> number of jobs that occur at the same time. I've organized each job to
> be an iteration of a loop, and I have 2 layers of recursion. Let me
> throw some numbers out there: my outer loop iterates 100 times, and my
> inner loop iterates 1000 times (each of these loops contains a DAG). I
> am implementing looping by unrolling the logical loop into a dynamically
> generated DAG file.
> While my solution might prevent condor's scheduler from getting
> overloaded with jobs, I am faced with another problem: organizing the
> files on disk so that one directory doesn't contain something on the
> order of 100*1000 = 100,000's of submit files (and a multiple for output
> and log files). I'm starting with the obvious: make a directory and
> subdirectory for each iteration of the inner loop. However, I am
> running across a problem.
You might be able to reduce the number of submit files you need by
judicious use of the VARS command; although that won't do anything
about input and output files, etc.
> condor_dag_submit -no_submit accepts my *.dag file and produces a
> *.dag.condor.sub file, but I am having trouble properly referencing this
> *.dag.condor.sub file from a *.dag file in the parent directory. I
> think this is because condor_dag_submit does not let me configure some
> of condor_dagman's arguments in the submit file it generates. For example:
> outer *.dag file:
> JOB MAINDAG_111 111/maindag_111.dag.condor.sub
> JOB MAINDAG_222 222/maindag_222.dag.condor.sub
> # Filename: maindag_111.dag.condor.sub
> # Generated by condor_submit_dag maindag_111.dag
> universe = scheduler
> executable = /opt/condor/bin/condor_dagman
> getenv = True
> output = maindag_111.dag.lib.out
> error = maindag_111.dag.lib.out
> log = maindag_111.dag.dagman.log
> remove_kill_sig = SIGUSR1
> on_exit_remove = (ExitBySignal == false || ExitSignal =!= 9)
> arguments = -f -l . -Debug 3 -Lockfile maindag_111.dag.lock
> -Condorlog /tmp/exp6/111/process_a_111.log -Dag maindag_111.dag -Rescue
> maindag_111.dag.rescue -MaxIdle 5 -MaxJobs 1 -UseDagDir
> environment =
> When the outer condor_dagman reads and tries to execute the inner loop's
> condor_dagman, it fails, because it looks in the outer directory for
> maindag_111.dag rather than in the directory 111 (where the above submit
> file, and anything related to maindag_111*, is).
Yes, that makes sense given your example.
> Is there a way I can tell condor_dag_submit to pass particular arguments
> (e.g. -Dag, -Rescue, output files) to the submit file it generates?
> It would be cool if there was a way to get condor_dagman to chdir() into
> a directory before executing. I looked at -UseDagDir, but this will put
> output/log files in the parent directory - something I am trying to avoid.
The way you have things right now, -UseDagDir has no effect because
there is no path in the DAG file specification in your submit file.
Basically, -UseDagDir does cause DAGMan to chdir() to the DAG file's
directory before doing a condor_submit on a node job, or whatever.
Okay, I *think* this will solve your problem: when you run
condor_submit_dag for your inner DAGs, run it in the top-level
directory, like this:
condor_submit_dag -no_submit -UseDagDir 111/maindag_111.dag
> I guess I could write my own condor_submit_dag too, but I'd rather not
> go to that extreme. :-)
> Any insight would be great. Thanks!
Please try things as I suggest above, and let us know if it works.