[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs License Management

Ian Chesal wrote:
So many questions. I've been waiting for this for...a while! :)

I figured as much. There is a startling number of possibilities for this feature. I almost sent you a preview version to play with.

You can use Concurrency Limits from 7.1.3 [1].

Any idea when we'll see fuller documentation for this feature? I'm most
interested in what happens if a job asks for two limited resources and
another job only asks for one. How are race conditions handled to
minimize blocking? Is there a paper maybe?

The documentation should be in the released manual very shortly. There may be a paper of some form about the feature in the (not near) future.

Limits do not directly consider job priority, nor are they gathered over time to satisfy a job.

The basic use/configuration of the feature is truly quite simple.

You specify the set of limits you want associated with a job via the concurrency_limits parameter in a submit file. The limits are specified as a list, e.g. concurrency_limits = a,b,b,c - signifying a job that needs 1 A, 2 Bs and 1 C to run. Limits are case insensitive.

You configure limits within the Negotiator's configuration file with X_LIMIT = #, where X is the name of a limit and # is the max you want to allow at one time. For all limits that do not explicitly have a _LIMIT configuration, there is CONCURRENCY_LIMIT_DEFAULT = # to specify their maximum. The default's default is large, meaning a job requesting a limit that is not configured will not be burdened by the request. The default also allows for the possibility to limit jobs generally, such as placing a cap on the number of jobs any one user or set of users may have running at one time.

The current usage of limits is indirectly accessible via condor_userprio -long.

Assign some number of licenses for use by Condor jobs, say
300. In your Negotiator's configuration add: MYLICENSE_LIMIT = 300

Now in each job that needs the license add:
concurrency_limits = MYLICENSE

Condor does not check out the licenses from Flexlm, it just
tries to keep the number of jobs that /will/ check out
licenses under control.

Presumably I can write a cron job (startd cron?) on my negotiator that
can update resource counts based on external factors -- is it sufficient
to do a reconfig to have the negotiator see the updated values? What
happens if I decrease a limit and there's more jobs running now than I
say I have resources? Do things preempt? Or does Condor just stop
running jobs that request this resource?

A reconfig is enough to alter configured maximums, i.e. X_LIMIT = 1 to X_LIMIT = 100. There are some clever tricks you could play to alter the apparent usage of a limit.

Condor will not actively preempt or otherwise stop jobs when a limit is exceeded, such as if you lower it. When a limit is reached or exceeded, no new jobs requiring the limit are matched. They will be rejected with a reason specifying that a limit they requested was not available - the specific limit is not reported.

Especially since you are sharing licenses between batch jobs
and interactive users you should setup your jobs to notice if
they failed because they did not checkout a license. This
configuration will be specific to your application, but the
document you already mentioned has a good example [2]. If
your program exits with code 52 when it fails to checkout a
license you'd add this to your job: on_exit_remove =
(ExitBySignal == TRUE) || (ExitCode != 52)

Optionally if your license resources supports queuing you can have your
batch jobs wait for a license instead of dying. Depending on how long
things run for and how expensive your licenses are this can be a good
option. For example: if licenses are >> compute hardware it's better to
hold the hardware and queue via the FlexLM manager for the license to
maximize license use than to return to Condor's queue and undergo
another negotiation cycle.

- Ian

Very good point.