[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Looking for something like CpuBusy for Disk



Ian Chesal wrote:
On Fri, Sep 3, 2010 at 2:30 PM, Lans Carstensen <Lans.Carstensen@xxxxxxxxxxxxxx <mailto:Lans.Carstensen@xxxxxxxxxxxxxx>> wrote:

    The seriously cool way would be to standardize a global resource
    classad type and allow for expression logic to have a way to
    directly address all resource ads that are deemed to apply to a
    particular slot and job.


That certainly takes it a step further. To that end: a way to put *any* type of ad in to the collector would really be interesting. So you could collect and reference anything you want and use it during matchmaking and execution decisions.

Baby steps though. :) I don't want to dream my way out this being possible.

Agreed.

    This reminds me that the other way we address the common use case
    for this class of problem is with concurrency limits on jobs - but
    that doesn't handle slot-level use cases, only job-level ones.

    Partitionable slot-level resource/concurrency limits would be a
    useful addition towards that goal.  Then you might want to apply
    some feedback mechanism to make the upper bound of that
    partitionable slot dynamic based upon some business logic (like a
    global resource classad).


Ahh...it's fun to dream. In all honesty though, while it's a bit cumbersome to setup once, Startd cron jobs can handle a lot of what we're dreaming about here when it comes to slot-level concurrency controls. And once you get one set up, the rest follow pretty easily.

True, startd cron's can (and have) been used to do all sorts of things - but I haven't seen one applied to do host-level concurrency limits (successfully). The "cron"/timeperiod nature makes doing resource counters unsafe for resource reservation. Do you have an example of one of those, or is it (like I currently believe) a gap in functionality that you end up having to build up custom slot types around to handle?

For instance, say I have a SAN-attached host and want to enable no more than 3 concurrent SAN IO jobs while also enabling other job types. Today I'd have to set up a special partitionable slot with a SAN attribute and start expression to only allow SAN jobs and do something like dedicating 3 CPU's and some amount of RAM towards that partitionable slot. Or make 3 SAN slots with dedicated memory resources. And then add a partitionable slot for all remaining CPU, memory, and local disk resourcse. There's no way to apply a "SAN" resource counter, and no way to alter the number "3" live based on actual SAN link utilization or storage subsystem latency. Right?

If you have a startd cron for that class of use case, we'd be interested in seeing it. Other competing resource schedulers have host-based resource counters for this reason.

-- Lans Carstensen