[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] concurrency limits



Rich Pieri wrote:
On Jun 21, 2011, at 9:10 AM, Matthew Farrellee wrote:
You can have each user put a limit in their submit file, or you
should be able to use this config...


It isn't that simple.  Again with the hypotheticals, let's say that
you have a Condor pool and a software package with 50 floating
licenses -- that is, the pool can have up to 50 instances of that
package running concurrently.  Alice wants to start a large,
long-running (days or weeks) job and she is authorized to use all 50
licenses.  She submits her job with a concurrency limit of 45
licenses (she is a considerate user) and her processes start running.


You come along a day or two later and want to reduce Alice's
concurrency limit to 30 licenses.  How do you propose to have Condor
make 15 of Alice's actively running processes release their licenses?

We specify a unique concurrency limit per dagman job group using the limit default for exactly this sort of situation, although instead of licenses it could be IO ops or anything else that you're starved on that will cause you to want to reduce the number of running jobs for a group. Our operators would then reduce the concurrency limit on the job group and condor_vacate_job a few to push the group back into it's box.

Our wishlist item for this area is to be able to specify via configuration a means for multiple concurrency limit defaults given different limit patterns, e.g.:

CONCURRENCY_LIMIT_DEFAULT=500
CONCURRENCY_LIMIT_DEFAULT_LIST = bigio,batch
CONCURRENCY_LIMIT_DEFAULT_LIST_bigio = 10
CONCURRENCY_LIMIT_DEFAULT_LIST_mayalicense = 50
MAYA_LIMIT = 500

A concurrency limit of "jg5000" for job group 5000 would start w/ a value of 500 and be the general group limit

A concurrency limit of "bigio.jg5000" would start w/ a value of 10 and we'd label job groups that are io-intensive with it.

A concurrency limit of "mayalicense.jg5000" would start w/ a value of 50. We could tag jobs with both it and the "maya" concurrency limit to restrict those jobs to consume at most 50 Maya licenses out of a theoretical pool of 500.

We do that today by changing concurrency limit configurations rapidly via our own tooling.

We also give our operators tools to scale the limit knobs up and down, which today is changing configuration rapidly. It would be more desirable for limits to be stored in the negotiator's AccountantNew.log instead of configuration so that it was easier for us to write our operator tools as we *have* had operators get syntax errors through our tools and into configuration before, but the system generally works for us.

Hope that helps you out.

-- Lans Carstensen, Systems Engineering, Dreamworks Animation
Vision without execution is hallucination.  --  Thomas Edison