[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Complex license handling



We have a truckload of PVM jobs we'd like to submit to our Condor cluster, and which will use a license server to determine whether they can run at the time given (with a finite number n of licences).
Only some of our machines are suitable to be PVM master nodes, so we have a custom machine attribute for this (PVM_Master = True).
Nothing dramatic until here, but we want to make sure two jobs don't get sent to the same machine at the same time (to avoid PVM I/O hell), so we introduced a LoadAvg type constraint ("submit it to PVM_Master machines with LoadAvg < 0.3 only"). Not the most elegant, but it worked for a while...

Trouble is now, job load varies greatly and can almost zero out for minutes (between calculation iterations, just file I/O happening). We'd like to make sure that the Condor Negotiator doesn't start thinking "there's a PVM_Master machine with a low LoadAvg, let's give him a new PVM job". How can we achieve that?

I've looked into:
* alternative ways of calculating LoadAvg (over longer periods of time - not sure this can be done within Condor)
* using group quotas (something like a PVM group with n machines for our n licences)
* using Master/Worker (not sure how)
* wrapping the job into a DAGMan (a bit clumsy, IMHO)

Any wisdom from the crowd? I'd be happy to provide more info if needed.
Thanks in advance
François