Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Understanding user priority and job preemption

Date: Thu, 15 Mar 2007 10:53:16 -0500
From: Dan Bradley <dan@xxxxxxxxxxxx>
Subject: Re: [Condor-users] Understanding user priority and job preemption



Jan Ploski wrote:

condor-users-bounces@xxxxxxxxxxx schrieb am 03/13/2007 07:38:30 PM:
I too am curious why you don't see the expected ratio of jobs running.
Here is one thing that may help in your condor configuration (on thenodes running startds).
CLAIM_WORKLIFE = 600
This prevents the schedd's claim to the startd from lastingindefinitely. Without this setting, the schedd will hold on to a claimas long as it has jobs to run on it (and as long as it doesn't getpreempted).
Thanks for the reply. Unfortunately, I am afraid that CLAIM_WORKLIFE doesnot affect job preemption.
Today I analyzed my problem some more. In particular, I tested a variantwithout any nodes that don't match job requirements. That is, I testedwith 20 rather than 60 total nodes.
As before, user A has priority 4 and user B has priority 8.

In the 20-node scenario, I can observe the following behavior:
1. If user A submits jobs first, taking all machines, and user B comes inlater, then user B does not get any machines - A's jobs are neverpreempted. User B does not get machines even if I remove some running jobsof user A. In this case A's jobs are preferred,
no matter what.

This is the scenario in which CLAIM_WORKLIFE should decrease the amountof time it takes to balance out the share of the pool. Without limitingthe lifespan of claims, it is expected that user A will retain 100% ofthe pool in the case you describe.

2. If user B submits jobs first, taking all machines, and user A comes inlater, then B's jobs are preempted and the expected ratio of machines 1:2becomes established.
Compare this with the 60-node scenario with 20 matching nodes, describedin my previous message:1. User A submits first, B comes later. The effect is the same as in case1 above, B starves.2. User B submits first, A comes later. Here, the expected 1:2 ratio doesnot set in. Instead, ALL 20 B-jobs are preempted and replaced with 20A-jobs.
Based on these observations, I speculate that the following is true:
- Condor never preempts a running job of user A in favor of a job of userB when A.userprio < B.userprio, no matter what PREEMPTION_REQUIREMENTS isset to; this would explain the "insufficient priority" messages I see inNegotiatorLog- In the second scenario, Condor calculates A's pie slice as 2/3 * 60 = 40nodes (rather than 2/3 * 20 = 13 matching nodes) and B's pie slice as 1/3* 60 = 20 nodes (rather than 1/3 * 20 = 7 nodes). During negotiationCondor tries to satisfy A's contingent first because A.userprio <B.userprio. All 20 matching nodes are assigned to A because 20 < 40. Next,Condor tries to satisfy B's contingent, but does not find any nodes whichmatch or are preemptible based on the first rule. Therefore, B getsnothing.
Can anyone confirm that the above reasoning is correct?


Yes.  You are correct.

If it is correct:
- Why is Condor assigning "pie slices" based on the total number of nodesin the pool rather than the total number of matching nodes?

The negotiator (as currently implemented) does not have a big list ofall the jobs from all the users. It just has a list of submitters (i.e.users), and it only ever considers one job at a time when makingmatchmaking decisions.

- Is there any way to achieve the expected 1:2 ratio between two userscompeting for N specific machines of a pool with a total size of M >= 3*N?

If you know in advance that users A and B will only ever be able to runon N machines within your pool, then you could use group quotas tospecify their share of the N machines. Here's more information on howto set that up:


http://www.cs.wisc.edu/condor/manual/v6.8/3_4User_Priorities.html#SECTION00446000000000000000

For the general case where N is very dynamic, depending on jobrequirements, I cannot think of a configuration solution.


I hope that helps.

--Dan

Follow-Ups:
- Re: [Condor-users] Understanding user priority and job preemption
  - From: Jan Ploski

References:
- Re: [Condor-users] Understanding user priority and job preemption
  - From: Jan Ploski

Prev by Date: Re: [Condor-users] SSL authentication with WinXP
Next by Date: Re: [Condor-users] SSL authentication with WinXP
Previous by thread: Re: [Condor-users] Understanding user priority and job preemption
Next by thread: Re: [Condor-users] Understanding user priority and job preemption
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Understanding user priority and job preemption