[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Determine when all jobs in a cluster have finished?



On Wed, Jan 30, 2013 at 01:13:23PM -0500, Brian Pipa wrote:
> On Wed, Jan 30, 2013 at 12:55 PM, R. Kent Wenger <wenger@xxxxxxxxxxx> wrote:
> > On Wed, 30 Jan 2013, Brian Pipa wrote:
> >
> >> I'd really like the whole thing to be self-contained in one DAG like:
> >> ###
> >> Job QueryDB querydb.job
> >> Job Workers workers.job
> >> Job PostProcess postprocess.job
> >> PARENT QueryDB CHILD Workers
> >> PARENT Workers CHILD PostProcess
> >> ###
> >>
> >> since that seems much simpler and self-contained but I don't think
> >> that's doable since the results of the QueryDB job determines the data
> >> and number of worker jobs I'll need. For example, one run of QueryDB
> >> could get 2 million results and I would create 2000 data files
> >> containing 1000 entries each and those would be consumed by 2000
> >> worker jobs. Another run might create only 1 data file and 1 worker. I
> >> can't think of a way to get this all working within one DAG file.
> >> Right now, I pass in to each worker an argument of the datafile to
> >> process.
> >
> >
> > Actually, you *can* do this.  The "trick" is that the workers.job file would
> > not exist at the time you submit the overall DAG.  The workers.job file
> > would be written by the QueryDB job (or maybe a post script to the QueryDB
> > job).  So the workers.job file could be customized to create however many
> > workers you needed based on the results of the query.
> >
> > If you want to use a post script, syntax is like this:
> >
> >   SCRIPT POST <job> <script> [arguments]
> >
> > So you could write something like a perl script that figures out how many
> > workers you want, and writes the workers.job file accordingly.
> >
> Yes, this is what I plan on doing now based on some other replies.
> But, I plan on just writing the job files directly from the QueryDB
> job (which is java code and I already have the code written to split
> it into pieces and make jobs from it). The one thing I still need is a
> way to use a variable with the job file for the jobs after the first.
> I'm trying to keep each run separated into its own directory...
> 
> Right now (not using DAG) each run is going into a directory based on
> the Cluster id of the querydb job like:
> /workspace/jobs/(Cluster)/
> but once I move to having it all in one dag file like:
> #####
>  Job QueryDB querydb.job
>  Job Workers workers.job
>  Job PostProcess postprocess.job
>  PARENT QueryDB CHILD Workers
>  PARENT Workers CHILD PostProcess
> #####
> 
> I really want it something like:
> #####
>  Job QueryDB querydb.job
>  Job Workers /workspace/jobs/(QueryDBCluster)/workers.job
>  Job PostProcess /workspace/jobs/(QueryDBCluster)/postprocess.job
>  PARENT QueryDB CHILD Workers
>  PARENT Workers CHILD PostProcess
> #####
Run PostProcess as a subdag, e.g.,

SUBDAG EXTERNAL PostProcess postprocess.dag

instead of 

Job PostProcess /workspace/jobs/(QueryDBCluster)/postprocess.job

When you create the subdag as part of QueryDB, you can write any path
you want to.
> 
> So the querydb.job is always the same and doesn't need to change, but
> the others job file locations will vary. So how can I get the
> $(Cluster) from the first job and use that in the job path for the
> others (so that I won't have one .dag file that could be overwritten
> each time)? or is there a better/easier way to do this? Basically, I'm
> using the QueryDB's $(Cluster) as the key to where to read/write all
> the files for the run. This will be used by the workers and by the
> postprocessor (and anything else that may arise that goes into the
> chain/DAG). I could certainly do some coding myself to accomplish this
> (like reading the /workspace/jobs/ dir and using the most recent
> directory, or even writing a text file with the just the $(Cluster) in
> it and reading from that) but I bet there's a way to do this using
> just Condor variables(?)