[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] maxidle for a dag with one node?



On Thu, 22 Sep 2011, Rob de Graaf wrote:

I have a job consisting of one cluster with several hundred thousand processes. The individual processes use $(Process) as an argument. I can't submit them all at once, so I made a DAG with one JOB node and tried to use condor_submit_dag's -maxidle throttling capability. According to the manual, each individual process counts as a job, so this matches what I want to do, but doesn't seem to work; the entire cluster is submitted regardless of what I set -maxidle to. I've also tried -maxjobs just in case, but that does what it says and throttles whole clusters, not the processes within.

Some of the subtle differences between maxidle and maxjobs have been difficult to explain -- I'll take another shot at it...

First of all, keep in mind that DAGMan only controls things at the submit file level of granularity. In other words, if DAGMan submits a submit file that has 'queue 10' in it, you get a cluster with 10 procs, and DAGMan doesn't try to do anything to the individual procs.

One of the differences between maxidle and maxjobs, though, is how things are counted towards the specified total. If you have a submit file that has 'queue 10' and DAGMan submits it, and all of the procs are idle, that counts as 10 towards maxidle. But if you have maxjobs set, it only counts as 1 towards that. Once you hit the limit, DAGMan just stops submitting any more jobs; it doesn't remove or hold specific jobs or procs.

Is there a way to throttle processes in a single-node DAG? I realize that I could split the cluster into many single-process clusters and use -maxjobs, but then I wouldn't be able to use $(Process) anymore. Ideally I'd like to avoid having to generate that many submit files.

Are you using $(process) for things like output file names? You can do something similar with arbitrary names in your submit file that have values assigned in the DAG file:
http://www.cs.wisc.edu/condor/manual/v7.7/2_10DAGMan_Applications.html#SECTION003106200000000000000
That would allow you to use the same submit file for many nodes in your DAG -- would that solve the problem for you?

(Basically, if you want DAGMan's throttling to work right you have to have a small number of procs per submit file, ideally one per submit file...)

Kent Wenger
Condor Team