[HTCondor-users] Image processing with HTCondor


I’m pretty new to Condor and I’m trying to understand the best approach for our application. We have a need to process thousands of images through a series of algorithms to do things like feature extraction. These algorithms can be and have been represented in a DAG like this:




JOB   ProcessingArea    PA.condor

JOB   EDMS0             EDMS0.condor

JOB   QVT               QVT.condor

JOB   MMlnD             MMLND.condor

JOB   PCFF              PCFF.condor

JOB   EDMS1             EDMS1.condor

JOB   DLP               DLP.condor


PARENT      ProcessingArea    CHILD EDMS0 EDMS1 QVT MMlnD PCFF




DOT dvf.dot


See attached for the diagram. I’ve been doing a lot of reading lately trying to figure out the best (or good enough) approach to our application. One change I’ll be making is to make use of the VARS syntax to create a single submission file for the DAG since each algorithm is implemented in the same executable and only one or two command line arguments vary between algorithms.


We need to run these seven algorithms over each image, these images are all in separate directories so I’m trying to figure out how others approach this. I thought I’d be able to use something like the flexible queue command to iterate over each image but my reading through the mailing list archive explained why this isn’t support with DAGman. At this point the only thing I’ve figured is to write a script to create unique DAG files for each image and then either submit each DAG file individually or wrap all of the individual DAGs into a “master” DAG as SUBDAGs.


I guess I’m ultimately asking for pointers or what approaches have others used in situations like this?


-Sean Milligan

Attachment: dag.jpeg
Description: dag.jpeg