[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] FW: job submitting to powerful machines.
- Date: Mon, 2 Feb 2009 16:46:42 -0500
- From: Ian Chesal <ICHESAL@xxxxxxxxxx>
- Subject: Re: [Condor-users] FW: job submitting to powerful machines.
> I guess my question for both cases would it be: what
> is the difference of using the parenthesis?
The short answer: order of operations. There's an order of operations when evaluating an expression, much like BEDMAS in algebra. Parenthesis help guide the order of operations. Operator precedence for Class Ad expressions is laid out in detail here:
So lets look at how this applies to your two scenarios:
> REQUIREMENTS = OpSys == "WINNT52" && machine == "machine1.domain.com" || machine == "machine2.domain.com"
This actually reads, once you take operator precedence in to account:
A machine running WINNT52 that's named machine1.domain.com
A machine named machine2.domain.com
The && gets evaluated before the || because it has higher precedence.
It's also worth mentioning that if you omit Arch from your Requirements string in your submit file Condor *automatically* add a constraint to your requirements that constrains your job to the same Arch as your submitting machine. You can see the full Requirements expression for a job using:
condor_q <cluster>.<job> -f "%s\n" Requirements
Thus armed lets dive in to your two scenarios:
> 1) I have two different scenarios or cases:
> Case A:
> When I submit jobs using the requirements between the parenthesis,
> my central manager kind of balance the jobs between all the machines,
> placing one job on only one CPU per machine.
> REQUIREMENTS = (Arch == "INTEL" && OpSys == "WINNT51") || (Arch == "INTEL" && OpSys == "WINNT52")
The brackets help you guide evaluation of precedence. This reads:
An INTEL machine running WINNT51
An INTEL machine running WINNT52
Since you've specified Arch and OpSys in your Requirements string Condor shouldn't add anything to your requirements for you.
Condor's fill policy is controlled by the negotiator. Specifically by NEGOTIATOR_POST_JOB_RANK. See:
It sounds like your system is set to prefer breadth-first filling. So if you have 2 jobs and 4 free slots match both the jobs, 2 each on 2 machines, one job will run in slot 1 on machine A and the other job will run in slot 1 on machine B. If you'd prefer depth first filling of your system you can change NEGOTIATOR_POST_JOB_RANK to something like:
NEGOTIATOR_POST_JOB_RANK = (RemoteOwner =?= UNDEFINED) * KFlops
This will break matches by preferring faster machines that are unclaimed. With no attention paid to the slot ID on which the job would run.
So I have a question for you: when you observe this first slot running job behaviour are the number of jobs in the queue << the total number of slots available on your machine? The breadth-first or depth-first filling of slots shouldn't matter much, if at all, if there are more jobs than slots that need to be run: every slot should get a job and the steady state for your system should be all slots running jobs.
> Case B:
> If I remove the parenthesis I will place the jobs on all the available CPU
> that the central manager can find as possible matches. Am I right or wrong?
> REQUIREMENTS = Arch == "INTEL" && OpSys == "WINNT51" || Arch == "INTEL" && OpSys == "WINNT52"
This reads the same as the example in Case A. The &&'s evaluate first and the || is true if either && condition on the left or right side of the expression is true.
You will see the same filling behaviour of your system with both Case A and Case B. Personally I think the Requirements string in Case A is easier to read so that's how I'd write it.
> 2) What is the difference between user priority and job priority? Which one overrides the other?
Job priority, the one you set in the submit file when you submit your jobs like this:
priority = 10
Or that you set using condor_prio is a way *one single user* can control the running order of their jobs. Lets say I have 10 clusters in the queue: 1 through 10. All clusters have 10 jobs in them and the jobs in all the clusters are exactly the same: same requirements, same target OS, etc. All of them start out with priority 0. In this case Condor will, more or less, start the jobs in the order they were submited: 1.1, 1.2, 1.3, ... , 2.1, 2.2, 2.3, ... , 10.8, 10.9, 10.10
If I wanted cluster 10 to run before all the others I'd set its priority higher than the other clusters:
condor_prio +10 10
This raises the priority of all the jobs in cluster 10 by 10.
*User priority* is a pool-wide metric kept and calculated by Condor that guides the division of the slots between multiple users in your system according to Condor's fairshare negotiation algorithm. *Effective user priority* is calculated using the internal-to-Condor "Real User Priority" number -- which measures the user's resource usage throughout time -- the priority factor set for the user in the system by the administrator, and some additional policy factors like a nice user setting and a remote users multiplier.
The ratio between effective user priorities is what determines how many machines a user can claim with their jobs. If a user has an EUP=5 they'll get twice as many resources as a user with EUP=10.
For the details from the manual.
Hope that helps!
This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution, or copying of this message, or any attachments, is strictly prohibited. If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments. Thank you.