[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Looking for something like CpuBusy for Disk



Hi Diane,

On Thu, Sep 2, 2010 at 6:23 PM, Diane Trout <diane@xxxxxxxxxxx> wrote:
I was wondering if there was anything like the CpuBusy macro that could I could use to throttle launching new processes if the disks were too busy?

There isn't anything in Condor to do this out-of-the-box but you could run Startd cron jobs on your machines that periodically run iostat or vmstat and return a "disk activity" number that gets stored and set as a ClassAd attribute on the machine's ad.

If you called the attribute CurrentDiskActivity and higher meant more disk I/O load, you could steer your jobs away from heavily loaded machines with:

rank = 1/CurrentDiskActivity

In your submit tickets. Or stop them from going to heavily loaded machines altogether with:

requirements = CurrentDiskActivity < <some threshold value>
 
Other possibilities:
 * Is there an easy way in a condor_submit script to limit the number of simultaneous jobs?

There's no throttling on condor_submit but there are two ways to achieve this:

1. You can use concurrency limits to prevent more than N jobs that use a resource are running at the same time. Concurrency limits are explained here: http://www.cs.wisc.edu/condor/manual/v7.4/3_13Setting_Up.html#35978

2. You can use DAGMan and submit a single node DAG. DAGMan can throttle the width of a DAG at any node in the graph. For information about DAGMan see: http://www.cs.wisc.edu/condor/manual/v7.4/condor_dagman.html#61297
 
 * Or can I query the load average of a different machine in my START _expression_?

If you use the Startd Cron approach I mentioned above you could make the ad you publish part of your machine's START _expression_. But tread carefully here. Switching START from True -> False for currently running jobs can have the unintended side effect of causing them to be vacated as Condor thinks the machine has been taken over and should be in the Owner state. Better would be to reference the attribute in your job's Requirements _expression_.
 
Or am I missing some better way of dealing with multi-gigabyte file format conversions?

Nope. You've got your bases covered. There are a few ways to approach it, which approach works best is up to you.

- Ian



Cycle Computing, LLC
The Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com