[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Determine when all jobs in a cluster have finished?



On Wed, Jan 30, 2013 at 12:55 PM, R. Kent Wenger <wenger@xxxxxxxxxxx> wrote:
> On Wed, 30 Jan 2013, Brian Pipa wrote:
>
>> I'd really like the whole thing to be self-contained in one DAG like:
>> ###
>> Job QueryDB querydb.job
>> Job Workers workers.job
>> Job PostProcess postprocess.job
>> PARENT QueryDB CHILD Workers
>> PARENT Workers CHILD PostProcess
>> ###
>>
>> since that seems much simpler and self-contained but I don't think
>> that's doable since the results of the QueryDB job determines the data
>> and number of worker jobs I'll need. For example, one run of QueryDB
>> could get 2 million results and I would create 2000 data files
>> containing 1000 entries each and those would be consumed by 2000
>> worker jobs. Another run might create only 1 data file and 1 worker. I
>> can't think of a way to get this all working within one DAG file.
>> Right now, I pass in to each worker an argument of the datafile to
>> process.
>
>
> Actually, you *can* do this.  The "trick" is that the workers.job file would
> not exist at the time you submit the overall DAG.  The workers.job file
> would be written by the QueryDB job (or maybe a post script to the QueryDB
> job).  So the workers.job file could be customized to create however many
> workers you needed based on the results of the query.
>
> If you want to use a post script, syntax is like this:
>
>   SCRIPT POST <job> <script> [arguments]
>
> So you could write something like a perl script that figures out how many
> workers you want, and writes the workers.job file accordingly.
>
Yes, this is what I plan on doing now based on some other replies.
But, I plan on just writing the job files directly from the QueryDB
job (which is java code and I already have the code written to split
it into pieces and make jobs from it). The one thing I still need is a
way to use a variable with the job file for the jobs after the first.
I'm trying to keep each run separated into its own directory...

Right now (not using DAG) each run is going into a directory based on
the Cluster id of the querydb job like:
/workspace/jobs/(Cluster)/
but once I move to having it all in one dag file like:
#####
 Job QueryDB querydb.job
 Job Workers workers.job
 Job PostProcess postprocess.job
 PARENT QueryDB CHILD Workers
 PARENT Workers CHILD PostProcess
#####

I really want it something like:
#####
 Job QueryDB querydb.job
 Job Workers /workspace/jobs/(QueryDBCluster)/workers.job
 Job PostProcess /workspace/jobs/(QueryDBCluster)/postprocess.job
 PARENT QueryDB CHILD Workers
 PARENT Workers CHILD PostProcess
#####

So the querydb.job is always the same and doesn't need to change, but
the others job file locations will vary. So how can I get the
$(Cluster) from the first job and use that in the job path for the
others (so that I won't have one .dag file that could be overwritten
each time)? or is there a better/easier way to do this? Basically, I'm
using the QueryDB's $(Cluster) as the key to where to read/write all
the files for the run. This will be used by the workers and by the
postprocessor (and anything else that may arise that goes into the
chain/DAG). I could certainly do some coding myself to accomplish this
(like reading the /workspace/jobs/ dir and using the most recent
directory, or even writing a text file with the just the $(Cluster) in
it and reading from that) but I bet there's a way to do this using
just Condor variables(?)

Brian