[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Determine when all jobs in a cluster have finished?

3 Answers so far - thanks - let me hit each one:
>I believe you are looking for "condor_wait".  The following page has all the info you need.
> http://research.cs.wisc.edu/htcondor/manual/current/condor_wait.html
Def seems better than the condor_q options but still doesn't seem ideal.

>Won't  NOTIFICATION=complete in your job submission do it?
>Should email when the cluster is complete - though it may email you when each job completes which you probably don't want ...
>--Russell Smithies
I don't want an email, I need programmatic notification, so I don't
think that will work for me. I don't want to kick off a separate
process that monitors email.

>I ran into a similar issue recently.  One option is to use DAGMan with a single node representing your job.  DAGMan will monitor the job for you and report completion.
Ah - this looks like just what I need. I'll have to re-architect my
code a bit but this certainly looks like what I need. Thanks! Wait -
can you elaborate on "use DAGMan with a single node representing your
job"? Is that what I described below?

So, with DAGMan it looks like I will need to have my DBQueries job be
completely separate, then it will create a DAGman job and submit it
such that it creates multiple jobs in a cluster (the workers) and
those must be done before we can post-process. I guess that would
work... it connects the processing with the post-processing but the
pre-processing (the DB query) is essentially separate (not managed
with DAGMan).

So, my dag file would look like
Job Workers workers.job
Job PostProcess postprocess.job
PARENT Workers CHILD PostProcess
and the initial (non-DAGMan job) QueryDB would create the workers.job,
postprocess.job, and the dag file and submit the DAG job.


On Tue, Jan 29, 2013 at 5:18 PM, Brian Pipa <brianpipa@xxxxxxxxx> wrote:
> Short: I'm trying to figure out when all jobs from a job cluster have
> finished so that I can do some post-processing. I can think of lots of
> ways for me to code this up, but it seems like there would be some
> easy way in Condor to do this - does anyone know how?
> Long: I have a single Java master task (that is also a Condor job,
> though that's not relevant) that does a large DB query then splits the
> results into chunks and submits each chunk to Condor as a job via one
> ClassAd so they all have the same Cluster id. These jobs are all Java
> worker jobs that call various tools to process the data. I have all of
> the output for each worker cluster going to a single directory so it's
> easy to keep them together and know what output is from which run. As
> I said above, I can think of a bunch of ways I could code up a
> solution but it seems like Condor might have a way to tell if a
> Cluster of jobs has finished or not.  Does anyone know if Condor does
> have a way to do this?
> UPDATE: while typing this email up I found:
> condor_q <cluster>
> which might work. When I submit the one big worker job, I capture the
> output from condor_submit and I can parse out the id from that "X
> job(s) submitted to cluster Y".  Then, after I submit the job, I can
> call
> condor_q Y
> periodically until it tells me no more jobs are in the q.
> or I could call
> condor_q Y |grep Y
> until I get nothing back.
> Does this sounds right/make sense? is there an easier way to do this?
> My way seems kind of hacky though I think it should work.
> Thanks!
> Brian