[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Determine when all jobs in a cluster have finished?

On Wed, Jan 30, 2013 at 01:13:23PM -0500, Brian Pipa wrote:
> Right now (not using DAG) each run is going into a directory based on
> the Cluster id of the querydb job like:
> /workspace/jobs/(Cluster)/
> but once I move to having it all in one dag file like:
> #####
>  Job QueryDB querydb.job
>  Job Workers workers.job
>  Job PostProcess postprocess.job
>  PARENT QueryDB CHILD Workers
>  PARENT Workers CHILD PostProcess
> #####
> I really want it something like:
> #####
>  Job QueryDB querydb.job
>  Job Workers /workspace/jobs/(QueryDBCluster)/workers.job
>  Job PostProcess /workspace/jobs/(QueryDBCluster)/postprocess.job
>  PARENT QueryDB CHILD Workers
>  PARENT Workers CHILD PostProcess
> #####

dagman itself runs as a job and hence has its own cluster ID; and each job
dagman submits will get its own cluster ID.  But you won't know what these
are going to be at the time you submit the dag.

You could allocate your own ID and create a new DAG file for each set of
jobs: e.g.

cat <<EOS >myjob$$.dag
Job Workers /workspace/jobs/$$/workers.job
... etc
condor_submit_dag myjob$$.dag

(Not entirely safe because $$ values can be recycled, but you get the idea).

It may seem messy to write out a separate DAG file each time you want to run
a collection of jobs.  However you should be aware that dagman writes out
state files with the dag filename as the base, so if you want to run
multiple DAGs concurrently they'll need to be in separate DAG files anyway. 
Also, having a concrete dag file on disk makes it possible for you to
re-submit a particular DAG.

You could instead create a new subdirectory for each dag, copy the DAG
there, and either cd within it or run with condor_submit_dag -usedagdir
option.  You should end up with all the temporary files created within that