[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Notification of multiple job completion



On Fri, 29 Oct 2010, Rob Matthews wrote:

I am new to Condor and am using it for Monte Carlo simulation. Each MC run
is independent and carried out by a given executable which produces a
results file, so I have a wrapper program which populates the inputs for
these and submits all the needed runs to the Condor queue. This all works
great except now I need some way of knowing when all the MC runs I submitted
are complete so I can postprocess results (i.e. parse all the individual
results files and operate as needed).

Right now my wrapper code does this by polling the local directory every 5
seconds looking for the needed results files but this becomes inefficient
with large simulations. Is there a mechanism in Condor to possibly execute a
program (like my postprocessing code) once all the jobs submitted to the
queue are compete?

You can do this by putting all of your MC jobs into a DAG with no dependencies (see http://www.cs.wisc.edu/condor/manual/v7.5/2_10DAGMan_Applications.html#SECTION003106500000000000000 for info about DAGMan).

However, from your description, it sounds like you might benefit from using DAGMan for more than just getting the notification when things are done. You could make the code that creates the input files a node in the DAG, then have all of the actual MC jobs be dependent on that node, and then have another node that does the postprocessing that's dependent on all of the MC nodes. This would get you the correct sequences of job submissions without any coding on your part, and it would also enable you to get rid of your wrapper code that does the actual submits. Plus you get all the other goodness of DAGMan, like options to re-try failed nodes...

Kent Wenger
Condor Team