[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Determine when all jobs in a cluster have finished?

Short: I'm trying to figure out when all jobs from a job cluster have
finished so that I can do some post-processing. I can think of lots of
ways for me to code this up, but it seems like there would be some
easy way in Condor to do this - does anyone know how?

Long: I have a single Java master task (that is also a Condor job,
though that's not relevant) that does a large DB query then splits the
results into chunks and submits each chunk to Condor as a job via one
ClassAd so they all have the same Cluster id. These jobs are all Java
worker jobs that call various tools to process the data. I have all of
the output for each worker cluster going to a single directory so it's
easy to keep them together and know what output is from which run. As
I said above, I can think of a bunch of ways I could code up a
solution but it seems like Condor might have a way to tell if a
Cluster of jobs has finished or not.  Does anyone know if Condor does
have a way to do this?

UPDATE: while typing this email up I found:
condor_q <cluster>
which might work. When I submit the one big worker job, I capture the
output from condor_submit and I can parse out the id from that "X
job(s) submitted to cluster Y".  Then, after I submit the job, I can
condor_q Y
periodically until it tells me no more jobs are in the q.
or I could call
condor_q Y |grep Y
until I get nothing back.

Does this sounds right/make sense? is there an easier way to do this?
My way seems kind of hacky though I think it should work.