[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Understanding rank, LoadAvg and ...

On tis, 2007-07-31 at 14:49 +0200, Wolf-Dieter Klotz wrote:
> Hi,
> I have two SMP (4 cpus each) machines: coral31 and coral32. In the 
> machine ClassAd of coral31 I set RANK=2.0 and in coral32
> RANK=1.0. Then I submit a job cluster with 4 jobs from coral31 without 
> setting any RANK in the job ClassAd. What Happens?  all 4 jobs stay at 
> coral31 as expected. Then I submit the same jobs from coral32. 3 of the 
> four jobs go to coral31 as expected but one stays on coral32 (which has 
> a lower RANK). Why? I thought that RANK expressions are deterministic - 
> I mean with my set up all 4 jobs should always go to coral31? Is there 
> something else that plays a role? Remark: on both machines I switched 
> off notification of console or tty activity.

AFAIK the RANK expression in the machine ClassAd defines how the
different jobs are ranked by a particular VM/Slot, i.e. it is evaluated
for all candidate jobs and the one with the highest RANK value is
choosen for execution. As I understand this is the opposite of what you
are aiming for. 

To define how the VMs/Slots are ranked by a particular job three
different expressions are used (in the given order):
NEGOTIATOR_PRE_JOB_RANK from the machine ClassAd, Rank from the job
ClassAd, and NEGOTIATOR_POST_JOB_RANK from the machine ClassAd. 

> I came to this test because users are asking that their jobs go to the 
> machines with highest throughput (MIPS) and lowest load (LoadAvg). But 
> all what I tried did not really work well. So I made this test with two 
> machines to better understand what is going on. And even the result of 
> this test does not fit my understandings. Does anyone know how to 
> reliably set up such a policy?
> Bye

In my pool I use the settings below. NEGOTIATOR_PRE_JOB_RANK will ensure
that if there are unused slots matching the jobs requirement they will
be used. NEGOTIATOR_POST_JOB_RANK states that the jobs will favour lower
numbered slots on faster machines unless the job ClassAd says something
else. The constant 1000 is choosen based on empirical experiments on my
particular pool.

# Try to match unused machines first.

# Almost breadth first node allocation.