[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] counting licenses



Matt Hope wrote:

On 5/4/05, Joshua Kolden <joshua@xxxxxxxxxxxxxxxxx> wrote:


I need to have a dynamic expression in which I can tick off licenses use
independently of the cpu I'm running on. Are dynamic expressions like
this even possible?



Dynamic user defined expressions are not really supported by condor itself.


Hmm that's too bad. Most of the render queue technology we use in visual effects production has this functionality, I had always kind of thought of it as a core feature.

conceptually (ignore implementation or physical location in this part) you need:

A 'licence server' which hands out a licence, takes it back and
reports how many are free to use.


In other applications there is simply a way to define local resources and global resources. A cpu would be a local resource, while network load might be a global resource. Advanced queue systems will even allow you to run a program (or a plug-in) that will asses the current state of the global resource. I might consume one of each on a particular job, for example, by taking a cpu from a local resource and a software use from the global resource list. But the global resource list can be anything you like you just define a variable and a count and in your job an expression to consume it.

Ignore for now the smp licensing and just think of the 'easy' solution
where 1 job == 1 licence.

Say that you have X free nodes and Y free licences where X >> Y.
condor will simply schedule X jobs (since any externally inserted
attributes will be unchanged on the scheduling pass). It might then be
able (with some horrific expressions) to make it kill jobs to come
back down to Y but I would think that any progress to a steady state
would be torturous and wasteful, not to mention prone to race
conditions with whatever external process was updating the licences...


Yes this would be silly. It's far better to simply not launch a job if a resource requirement is completely consumed. If I want to share a resource with an interactive session, such as maya on the queue vs. maya on my desktop, I simply put a wrapper script around the maya command that tells condor it is forcing the consumption of a resource, and therefore condor must drop the lowest priority job using that resource, in exactly the same way that a cpu is recaptured by a user returning to their machine.

Realistically condor could do with a condor_licence daemon and a
simple way for jobs to indicate that they required the licence. condor
would then be able to control the whole thing much more effectively.
Of course making this a reality is a slightly more complex issue...


A) this is overly complex, and B) to specific, a general solution that has a non CPU constrained resource that can be consumed by an expression would be more consistent with the classAdd idea.

I know you don't want to here this but at the moment any dynamic
solution you come up with is likely to be more wasteful/error prone
than simply assigning a set of machines as maya enabled and having
jobs requiring these machines preempt any non maya jobs running on
them.


This is unfortunate, I had just assumed that there was a dynamic expression system, I really can't see how the queue can function properly without it. How, for example, do you keep too many jobs from running at once and thrashing a network? How do you consume disk space in a quota environment? How do you handle any licensed software that floats?

j