[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] uniform distribution of processes on physical nodes



On Mon, Jul 7, 2008 at 3:11 PM, Dan Bradley <dan@xxxxxxxxxxxx> wrote:
>
>
> Matt Hope wrote:
>
>>On Thu, Jul 3, 2008 at 4:45 PM, Ian Chesal <ICHESAL@xxxxxxxxxx> wrote:
>>
>>
>>>>What's the best way to achieve an uniform distribution of
>>>>processes on the physical nodes, so that, in the previous
>>>>example, each physical node run two processes?
>>>>
>>>>
>>>##  The NEGOTIATOR_POST_JOB_RANK expression chooses between
>>>##  resources that are equally preferred by the job.
>>>##  The following example expression steers jobs toward
>>>##  faster machines and tends to fill a cluster of multi-processors
>>>##  breadth-first instead of depth-first.  In this example,
>>>##  the expression is chosen to have no effect when preemption
>>>##  would take place, allowing control to pass on to
>>>##  PREEMPTION_RANK.
>>>##
>>>##  Break ties by looking for machines that have Idle longer than others
>>>##  and use them first. Also try and use faster machines before slower
>>>##  machines and assign jobs to separate machines before we start
>>>putting
>>>##  two jobs on a machine.
>>>NEGOTIATOR_POST_JOB_RANK = (((Activity =?= 'Owner') * (State =?=
>>>'Idle')) * 1000000000) + ((Activity =?= 'Unclaimed') * 100000000) +
>>>(KFlops * 0.001) - (VirtualMachineID * 10)
>>>
>>>If you just want breadth-first filling:
>>>
>>>NEGOTIATOR_POST_JOB_RANK = (RemoteOwner =?= UNDEFINED) *
>>>VirtualMachineID
>>>
>>>That'll fill the higher VMs first.
>>>
>>>
>>
>>Incidentally on the version we are running (6.8.8) this did not have
>>the desired effect (I had assumed it was working and was rather
>>shocked to discover it wasn't and we were wasting a lot of through
>>put).
>>
>>
>
> What aspect of this policy was not working as expected?  The preference
> for idle machines?  Or the uniform distribution across machines?
>
> One thing to keep in mind about NEGOTIATOR_POST_JOB_RANK is that it may
> be overruled by NEGOTIATOR_PRE_JOB_RANK or the rank specified by the
> job.  Only if those expressions leave more than one match tied for top
> place does the post job rank come into play.

sorry - I should have been more clear - we were actually using
NEGOTIATOR_PRE_JOB_RANK .

NEGOTIATOR_PRE_JOB_RANK = $(UWCS_NEGOTIATOR_PRE_JOB_RANK)
(which is the default) RemoteOwner =?= UNDEFINED

I have however just spotted that we are also using:
NEGOTIATOR_POST_JOB_RANK = $(UWCS_NEGOTIATOR_POST_JOB_RANK)
but UWCS_NEGOTIATOR_POST_JOB_RANK is not defined.I wonder if this is
the source of the bad behaviour...

By bad behaviour I mean new jobs ending up on machines which are
happily running jobs when empty machines they could happily run on
exist. It actually looked like it ended up targeting them in fact.

I'll find out.

Matt