[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor priority model



Hi Matthew, hallo Carsten,

On Wed, Sep 15, 2004 at 07:36:50AM +0100, matthew hope wrote:
> The scheduling logic is a little opaque.

nicely said :-/

> if you have a claim to a machine and the machine doesn't want to
> preempt it then you can keep on sending jobs till the cows come home.

Figure 3.4 in the manual shows the possible transitions (although it's
almost Greek to me, with it's states and activities).

It still needs several minutes of 100% CPU brain work to figure out what
this means...

> the action of claiming a machine to a user being disassociated from
> the action almost certainly comes because the scheduling logic can be
> faster and in most cases it does not significantly impact users (esp
> in a preemption-> checkpoint environment)

We don't preempt/checkpoint (low bandwidth being the main reason, and
local checkpoints aren't generally a good idea IMHO)

> If your job latency is important remember that condor is geared up for
> high throughput not low latency...

In plain English: one may have to wait some time to get machine access,
but once one has it (CLAIMED the VMs), the whole cluster will be pushed
through, right? (Reminds me of that classical queue situation where the
old lady asks to pass by since she only needs one little thing, and then
remembers 1000s of others she needs too...)

Which means that high priority users (low prio factors) may have to wait
for ages if there's a low prio user with 30000 jobs who just took the
chance when the whole pool was idle. Or to setup preempting (which can
be a pain with hundreds of VMs running long - more than a week or so -
jobs). Right?
Or the low prio user has to go and condor_hold her still idle jobs.

Would be nice if there was an "-idleonly" option to condor_hold. Perhaps
a suggestion for the next release? (In some cases it would also make
sense to have a count limit to condor_release so only say 500 of 30000
jobs would be released at one time...) (Of course it can be done using
some long one-liner shell script, but most of our Condor users are
users, not geeks.)


Cheers,
 Steffen

-- 
Steffen Grunewald * * * Merlin cluster admin (http://pandora.aei.mpg.de)
Albert-Einstein-Institut (MPI Gravitationsphysik, http://www.aei.mpg.de)
       Science Park Golm, Am Mühlenberg 1, 14476 Potsdam, Germany
e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}