[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Buncing many (2-30 sec) calculations as one job



> I am unable to condor_compile my application so I have to use the
> vanilla universe either way. It sais it can't find my application
> libraries when running condor_compile. With regular "make" everything
> is fine. Do I need to statically compile my app for this to work?

That sounds familiar, you better check the manual for that.

> Anyway, I have been looking at DAGMan for automatic post operations
> (finding the best result from all the DAG jobs after they all are
> finished. I have a child DAG node with a script as executable in the
> "local" universe). But each DAG node/job is submitted as individual
> jobs from the DAGman script and require their own scheduling (or so it
> seems). I have X number of condor submit scripts (DAG nodes) each with
> ONE job/calculation.
> 
> Since one execution can be as short as 2 sec (and the executable is
> ~22MB+libs(~2MB)) I would like to bunch them up a bit so that one job
> submission would run 10 or 100++ calculations. I guess this could be
> done by sending a bash script as executable with 10 or 100 lines of
> the executable with their respective command line arguments. But what
> will happen to the output file from each condor job if there are
> multiple executions of the binary in one job? Will each execution
> overwrite each others data so I only get the result from the last
> execution? Is there some other way to do this?

If you could have a loop where each iteration:
* "generates" the input parameters / input filenames
* runs the job
* copies/moves the outputs/results somewhere
* repeats

and then bundles all the results afterwards, that might do the job.
 
> And, by the way, is there any smooth way of sending larger
> jobs/bunches to more powerful nodes more or less automatically? Say I
> have 10 jobs each with 100 computations and nodes ranging from old P3
> to new Core2Quad machines. It would be nice to be able run a bunch of
> maybe 200 calculations on the best machine and 20 on the slowest so
> that they use more or less the same time.

A job can be sent to more powerful machines using the RANK expressions,
choosing to do it for your jobs with more subjobs would take a bit more
work. Maybe a shellscript at the client end can generate a submit file
with RANK expressions depending on the number of subjobs.

Sorry if all this is a bit vague!

JK