[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Preprocessing multiple dependent job results

On 10/18/07, Atle Rudshaug <atle.rudshaug@xxxxxxxxx> wrote:
> Hi!
> I have a calculation which I want to run many times with different
> input (permutations). The jobs are not dependent on eachother during
> execution. Currently I am running all the calculations serially on one
> machine, but I want to spread the different calculations
> (permutations) out on a grid to make it faster. The calculations are
> run for each time step in a database and I would like to have the
> results realtime. The individual jobs are pretty small, but there can
> be many of them. I have all the input files for the calculations on a
> NFS server.
> Now my problem. After all the calculations for one time step is
> finished I have to find the job that calculated the best result. This
> result has to be written to a database for that time step (Or maybe I
> could just get the input for the best job and run that single one
> again locally). Is there some bulit in way to compare output? I have
> been looking at DAGMan for dependent jobs which looks like something I
> could use. Should I write the result to standard out or to a NFS
> location? Could DRMAA help me?
> I am also unable to relink my executable with the Condor libraries. It
> sais it can't find my own libraries which are to be linked in as well.
> Regular compile without condor_compile works just fine. Therefore I
> have to run with the vanilla universe.

So long as you copy back the relevant outputs from each job it should
not be hard to have a central process/script/service that spots when a
time step is done and decides the best result and writes it to the
database (optionally cleaning up the unneeded ones if feeling brave)

If the input to the next time step is dependent on this step then you
could do some stuff with DAGs such that each time step is a submission
of several independent jobs all of which are dependency inputs into a
single job which picks the best result, writes to the database and on
which the next parallel jobs depend.

You may find it is easier to have this all within the DAG or to simply
automate the selection/continue process yourself, both have pros and