[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Better control over negotiator?



> On Fri, Feb 08, 2008 at 10:03:01AM -0500, Ian Chesal wrote:
> > So we fill our machines "width first". If the system is 
> empty jobs start
> > on all the slot1@<machine> locations then start filling up the
> > slot2@<machine> locations. That way, in a system with light 
> load, the
> > jobs get to run on as free a machine as possible. We do this with:
> 
> IMHO this would efficiently lock out occasional big jobs.

There's more to how we run our system than just what I'm showing you
here. In practice there's no problem. We actually have slot1 on our
machines configured smaller than slot2, that is we don't distribute the
resources on our machines evenly between all slots.
 
> > ALTERA_NEGOTIATOR_POST_JOB_RANK = (((Activity =?= 'Owner') 
> * (State =?=
> > 'Idle')) * 1000000000) + ((Activity =?= 'Unclaimed') * 100000000) +
> > (KFlops * 0.001) - (VirtualMachineID * 10)
> 
> Why would one check for "Owner" during negotiation?

A machine with a complex START expression will show as state Owner if it
the START expression references job ads. You need to evaluate start in
the context of a *specific* job ad in order to determine if the machine
is really available to run jobs or not. When doing the PRE and POST job
rank we found that the specific job ad was not being used and machines
where a job could run, but where Owner because of a complete START
expression, were being ignored. This fixed the problem.

> Your expression would favour slot 1 over slot 2. Not what 
> we're looking for.

No, but you can see how we're controlling the fill pattern here. Adjust
to suit your needs. 

> > ALTERA_NEGOTIATOR_PRE_JOB_RANK =  (((Activity =?= 'Owner') 
> * (State =?=
> > 'Idle')) * 1000000000) + ((Activity =?= 'Unclaimed') * 100000000)
> 
> same as above: if a slot is in Owner/Idle state, it's not matchable.
> Confusing.

Confusing: yes. Not matchable: no. Take a machine that has some complex
START expression that references its local classad attribute and then,
at the end of START, adds a job ad attribute that limits the use of the
machine to only some users, but it's custom attribute that may or may
not show up in all jobs so we use =?=:

START = ... && (TARGET.CustomOwner =?= "ichesal")

If that machine is unclaimed and I do a condor_status on it, it shows up
as Owner+Idle because that last bit evaluates as False (or maybe
unknown, I'm not sure) when there's no job ad to evaluate it against.

A job *could* run there though if it has CustomOwner defined and set to
"ichesal" in it's classad. Our negotiator sort makes sure these machines
are considered.

- Ian


Confidentiality Notice.  This message may contain information that is confidential or otherwise protected from disclosure.
If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution, 
or copying of this message, or any attachments, is strictly prohibited.  If you have received this message in error, 
please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.