[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor's DAG scalability?


I'm planning to use Condor on a cluster of ~50 CPUs to carry out a large set of experiments. Each experiment will have several different modules, which need to be executed in a sequential fashion. My block diagrams of each experiment are arranged such that both looping and nested looping need to occur. Fortunately, iterations of loops are completely independent of each other data-wise.

I see that Condor's DAG functionality only allows inclusion of one job per submit file that is referenced with the "JOB" directive. Therefore, I see the most straightforward solution to condor-izing my experiment is to dynamically generate a DAG file with (potentially) hundreds or thousands of JOB entries, and PARENT/CHILD entries with hundreds or thousands of arguments.

May I solicit some words of wisdom with respect to the scalability of Condor's DAG functionality as I will be using it? :-) Have others used Condor's DAG tools for single experiments in which there are thousands (or even millions) of component processes? Of course, some of these components will be hidden under nested condor_dagman executions, but nevertheless, there will be a lot of schedule-processing going on...will Condor and/or condor_dagman be able to handle this?

Any advice is appreciated!  Thanks,

 - Armen

Armen Babikyan
MIT Lincoln Laboratory
armenb@xxxxxxxxxx . 781-981-1796