[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] FW: job submitting to powerful machines.



Ian,
Thanks a lot for your well documented reply, sorry to keep knocking at
your door but it's hard to find someone with your willingness to teach
and instruct. I got some clarifications but other doubts arise. I got
lost completely in the two points
a)  NEGOTIATOR_POST_JOB_RANK, where should I change this value at the
Central manager condor_config file or across the board on all the
machines?
b) Effective user Priority and Real User Priority, specifically at this
part: " User priority* is a pool-wide metric kept and calculated by
Condor that guides the division of the slots between multiple users in
your system according to Condor's fair share negotiation algorithm.
*Effective user priority* is calculated using the internal-to-Condor
"Real User Priority" number -- which measures the user's resource usage
throughout time -- the priority factor set for the user in the system by
the administrator, and some additional policy factors like a nice user
setting and a remote users multiplier."
Thanks again for your time and knowledge,
Sincerely,
Alex


-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: Monday, February 02, 2009 4:47 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] FW: job submitting to powerful machines.

> Ian,
> I guess my question for both cases would it be: what
> is the difference of using the parenthesis?

The short answer: order of operations. There's an order of operations
when evaluating an expression, much like BEDMAS in algebra. Parenthesis
help guide the order of operations. Operator precedence for Class Ad
expressions is laid out in detail here:

http://www.cs.wisc.edu/condor/manual/v7.0/4_1Condor_s_ClassAd.html#SECTI
ON00511300000000000000

So lets look at how this applies to your two scenarios:

> REQUIREMENTS = OpSys == "WINNT52" && machine == "machine1.domain.com"
|| machine == "machine2.domain.com"

This actually reads, once you take operator precedence in to account:

    Run on:
        A machine running WINNT52 that's named machine1.domain.com
        -or-
        A machine named machine2.domain.com

The && gets evaluated before the || because it has higher precedence.

It's also worth mentioning that if you omit Arch from your Requirements
string in your submit file Condor *automatically* add a constraint to
your requirements that constrains your job to the same Arch as your
submitting machine. You can see the full Requirements expression for a
job using:

    condor_q <cluster>.<job> -f "%s\n" Requirements

Thus armed lets dive in to your two scenarios:

> 1) I have two different scenarios or cases:
> Case A:
> When I submit jobs using  the requirements between the parenthesis,
> my central manager kind of balance the jobs between all the machines,
> placing one job on only one CPU per machine.
<snip>
> REQUIREMENTS = (Arch == "INTEL" && OpSys == "WINNT51") || (Arch ==
"INTEL" && OpSys == "WINNT52")

The brackets help you guide evaluation of precedence. This reads:

        Run on:
                An INTEL machine running WINNT51
                -or-
                An INTEL machine running WINNT52

Since you've specified Arch and OpSys in your Requirements string Condor
shouldn't add anything to your requirements for you.

Condor's fill policy is controlled by the negotiator. Specifically by
NEGOTIATOR_POST_JOB_RANK. See:

http://www.cs.wisc.edu/condor/manual/v7.0/3_3Configuration.html#14868

It sounds like your system is set to prefer breadth-first filling. So if
you have 2 jobs and 4 free slots match both the jobs, 2 each on 2
machines, one job will run in slot 1 on machine A and the other job will
run in slot 1 on machine B. If you'd prefer depth first filling of your
system you can change NEGOTIATOR_POST_JOB_RANK to something like:

NEGOTIATOR_POST_JOB_RANK = (RemoteOwner =?= UNDEFINED) * KFlops

This will break matches by preferring faster machines that are
unclaimed. With no attention paid to the slot ID on which the job would
run.

So I have a question for you: when you observe this first slot running
job behaviour are the number of jobs in the queue << the total number of
slots available on your machine? The breadth-first or depth-first
filling of slots shouldn't matter much, if at all, if there are more
jobs than slots that need to be run: every slot should get a job and the
steady state for your system should be all slots running jobs.

> Case B:
> If I remove the parenthesis I will place the jobs on all the available
CPU
> that the central manager can find as possible matches.  Am I right or
wrong?
<snip>
> REQUIREMENTS = Arch == "INTEL" && OpSys == "WINNT51" ||  Arch ==
"INTEL" && OpSys == "WINNT52"

This reads the same as the example in Case A. The &&'s evaluate first
and the || is true if either && condition on the left or right side of
the expression is true.

You will see the same filling behaviour of your system with both Case A
and Case B. Personally I think the Requirements string in Case A is
easier to read so that's how I'd write it.

> 2) What is the difference between user priority and job priority?
Which one overrides the other?

Job priority, the one you set in the submit file when you submit your
jobs like this:

priority = 10

Or that you set using condor_prio is a way *one single user* can control
the running order of their jobs. Lets say I have 10 clusters in the
queue: 1 through 10. All clusters have 10 jobs in them and the jobs in
all the clusters are exactly the same: same requirements, same target
OS, etc. All of them start out with priority 0. In this case Condor
will, more or less, start the jobs in the order they were submited: 1.1,
1.2, 1.3, ... , 2.1, 2.2, 2.3, ... , 10.8, 10.9, 10.10

If I wanted cluster 10 to run before all the others I'd set its priority
higher than the other clusters:

        condor_prio +10 10

This raises the priority of all the jobs in cluster 10 by 10.

*User priority* is a pool-wide metric kept and calculated by Condor that
guides the division of the slots between multiple users in your system
according to Condor's fairshare negotiation algorithm. *Effective user
priority* is calculated using the internal-to-Condor "Real User
Priority" number -- which measures the user's resource usage throughout
time -- the priority factor set for the user in the system by the
administrator, and some additional policy factors like a nice user
setting and a remote users multiplier.

The ratio between effective user priorities is what determines how many
machines a user can claim with their jobs. If a user has an EUP=5
they'll get twice as many resources as a user with EUP=10.

See:

http://www.cs.wisc.edu/condor/manual/v7.0/3_4User_Priorities.html#SECTIO
N00441000000000000000

For the details from the manual.


Hope that helps!

- Ian

Confidentiality Notice.
This message may contain information that is confidential or otherwise
protected from disclosure. If you are not the intended recipient, you
are hereby notified that any use, disclosure, dissemination,
distribution,  or copying  of this message, or any attachments, is
strictly prohibited.  If you have received this message in error, please
advise the sender by reply e-mail, and delete the message and any
attachments.  Thank you.

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/