[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs License Management



Klaus,
As a quick note, if you aren't using 7.1.3 and you only have one
license-type to manage, you can schedule all jobs that require it on
one scheduler, and use MAX_JOBS_ RUNNING to constrain the number of
running jobs that require that license-type.

On concurrency limits, I wanted to add some important features/cases
that could be very useful.

One of the primary use cases we've seen over and over again with
Condor pools is managing jobs that require *multiple* licenses. Sun
Grid Engine users create "consumable" resources for licenses, which
are similar to concurrency_limits. Jobs can require access to multiple
of these resources.

You can now do this within Condor 7.1.3. This version manages not only
one limit, like Matt's example, but also manages multiple constraint
limits at the same time. For example, let's say you have a base
application, like a renderer, and a set of plug-ins for that
application. You have different numbers of licenses for each, so now
you can say in the Negotiator Config:
# three licenses for the application
APPNAME_LIMIT = 3
#one license each for the plugins
APPNAME_PLUGIN_A_LIMIT = 1
APPNAME_PLUGIN_B_LIMIT = 1

You can now specify multiple limits in your jobs using:
concurrency_limits = APPNAME, APPNAME_PLUGIN_A
or
concurrency_limits = APPNAME, APPNAME_PLUGIN_B

Also, let's say you have a job that uses two of the application's
licenses, you could say:
concurrency_limits = APPNAME, APPNAME

Plus these limits don't apply solely to licenses (as mentioned earlier
they don't check licenses out of FlexLM). Jobs can now be run with
limits on other types of load-based resources. If you want to ensure
that load on a database is healthy you can constrain the number of
allowed jobs that connect to it, as an example, by telling the
negotiatior:
PRODUCTION_DATABASE_LIMIT = 40
and tagging jobs that use the DB with:
concurrency_limits = ..., PRODUCTION_DATABASE

Alternately, Condor already has tags about the FileSystemDomain, but
using concurrency limits you can constrain the number of jobs
concurrently accessing the filer to ensure it isn't overwhelmed:
FILER_A_LIMIT = 1000

Then within jobs you can tag the resources that the job requires to
ensure that your grid runs within healthy limits:
concurrency_limits = APPNAME, PRODUCTION_DATABASE, FILER_A

It is important to realize that over-constraining can lead to
scheduling issues/starvation, and this system also *requires* that all
jobs which use resources honestly report them in the
concurrency_limits tag.

Personally I am very excited about this feature, as it is one we and
various other folks have been requesting/bumping into for the past
couple of years. Should be applicable to a number of interesting uses.

Hope this helps...

Good Luck!
Jason

-- 
===================================
Jason A. Stowe
cell: 607.227.9686
main: 888.292.5320

Cycle Computing, LLC
Leader in Condor Grid Solutions
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com

On Tue, Oct 14, 2008 at 4:51 PM, Matthew Farrellee <matt@xxxxxxxxxx> wrote:
> You can use Concurrency Limits from 7.1.3 [1].
>
> Assign some number of licenses for use by Condor jobs, say 300. In your
> Negotiator's configuration add: MYLICENSE_LIMIT = 300
>
> Now in each job that needs the license add: concurrency_limits = MYLICENSE
>
> Condor does not check out the licenses from Flexlm, it just tries to
> keep the number of jobs that /will/ check out licenses under control.
>
> Especially since you are sharing licenses between batch jobs and
> interactive users you should setup your jobs to notice if they failed
> because they did not checkout a license. This configuration will be
> specific to your application, but the document you already mentioned has
> a good example [2]. If your program exits with code 52 when it fails to
> checkout a license you'd add this to your job: on_exit_remove =
> (ExitBySignal == TRUE) || (ExitCode != 52)
>
> Best,
> matt
>
> [1]
http://www.cs.wisc.edu/condor/manual/v7.1/8_3Development_Release.html#SECTION00931000000000000000
[2]
http://www.cs.wisc.edu/condor/techpaper/licenses.html



-- 
===================================
Jason A. Stowe
cell: 607.227.9686
main: 888.292.5320

Cycle Computing, LLC
Leader in Condor Grid Solutions
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com