[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Understanding user priority and job preemption



condor-users-bounces@xxxxxxxxxxx schrieb am 03/13/2007 07:38:30 PM:

> 
> I too am curious why you don't see the expected ratio of jobs running.
> 
> Here is one thing that may help in your condor configuration (on the 
> nodes running startds).
> 
> CLAIM_WORKLIFE = 600
> 
> This prevents the schedd's claim to the startd from lasting 
> indefinitely. Without this setting, the schedd will hold on to a claim 
> as long as it has jobs to run on it (and as long as it doesn't get 
> preempted).

Thanks for the reply. Unfortunately, I am afraid that CLAIM_WORKLIFE does 
not affect job preemption.

Today I analyzed my problem some more. In particular, I tested a variant 
without any nodes that don't match job requirements. That is, I tested 
with 20 rather than 60 total nodes.

As before, user A has priority 4 and user B has priority 8.

In the 20-node scenario, I can observe the following behavior:
1. If user A submits jobs first, taking all machines, and user B comes in 
later, then user B does not get any machines - A's jobs are never 
preempted. User B does not get machines even if I remove some running jobs 
of user A. In this case A's jobs are preferred,
no matter what.
2. If user B submits jobs first, taking all machines, and user A comes in 
later, then B's jobs are preempted and the expected ratio of machines 1:2 
becomes established.

Compare this with the 60-node scenario with 20 matching nodes, described 
in my previous message:
1. User A submits first, B comes later. The effect is the same as in case 
1 above, B starves.
2. User B submits first, A comes later. Here, the expected 1:2 ratio does 
not set in. Instead, ALL 20 B-jobs are preempted and replaced with 20 
A-jobs.

Based on these observations, I speculate that the following is true:
- Condor never preempts a running job of user A in favor of a job of user 
B when A.userprio < B.userprio, no matter what PREEMPTION_REQUIREMENTS is 
set to; this would explain the "insufficient priority" messages I see in 
NegotiatorLog
- In the second scenario, Condor calculates A's pie slice as 2/3 * 60 = 40 
nodes (rather than 2/3 * 20 = 13 matching nodes) and B's pie slice as 1/3 
* 60 = 20 nodes (rather than 1/3 * 20 = 7 nodes). During negotiation 
Condor tries to satisfy A's contingent first because A.userprio < 
B.userprio. All 20 matching nodes are assigned to A because 20 < 40. Next, 
Condor tries to satisfy B's contingent, but does not find any nodes which 
match or are preemptible based on the first rule. Therefore, B gets 
nothing.

Can anyone confirm that the above reasoning is correct?

If it is correct:
- Why is Condor assigning "pie slices" based on the total number of nodes 
in the pool rather than the total number of matching nodes?
- Is there any way to achieve the expected 1:2 ratio between two users 
competing for N specific machines of a pool with a total size of M >= 3*N?

Best regards,
Jan Ploski

--
Dipl.-Inform. (FH) Jan Ploski
OFFIS
Betriebliches Informationsmanagement
Escherweg 2  - 26121 Oldenburg - Germany
Fon: +49 441 9722 - 184 Fax: +49 441 9722 - 202
E-Mail: Jan.Ploski@xxxxxxxx - URL: http://www.offis.de