[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Fair Resource Allocation question



Thanks Ian for the prompt and clear reply. Few more questions:

1.      Is there an easy way to control and set the EUP, per each user, prior to the negotiation cycle ? for example, I would like to ensure that all users have the same EUP.

2.      How can the negotiation cycle be disabled from being activated automatically (probably in ../ etc/condor_config file) and how can it be triggered manually at the command line.

Thanks,

Yuval.  

--

Yuval Leader

Design Automation Engineer, Mellanox Technologies

mailto: leader@xxxxxxxxxxxx

Tel: +972-74-7236360     Fax: +972-4-9593245

Beit Mellanox. 6th Floor,R-620

P.O.Box 586, Yokneam Industrial Park, 20692 Israel

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: Wednesday, August 22, 2012 6:42 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Fair Resource Allocation question

 

Hi Yuval,

On Wednesday, 22 August, 2012 at 7:35 AM, Yuval Leader wrote:

As a new user, I would like to understand how Condor will handle my expected allocation scenarios below.

 

1. There are 4 users, A, B, C, D.

2. Each user owns 50 machines and they all agree to pool them together into a 200 machine condor pool.

3. For simplicity, all machines are the same and all jobs have the same requirements all the time, and all users have the same User priority.

4. Assume also that preemption is disabled.

5. case #1: All 4 users, submit exactly N jobs each (where N is lower or equal to 50). The expected behavior after the first negotiation cycle, is that each user will get exactly N machines (1 per job).

Add another assumption and yes, this is what you'll see. The other assumption you need to add is that all four users have the exact same effective user priority. See: http://research.cs.wisc.edu/condor/manual/v7.6/3_4User_Priorities.html#25902

 

If they all have the same EUP, they'll all get exactly 1/4 of the system after one negotiation cycle assuming everything about their jobs is equal.

 

This is easy enough to test.

 

I queued up 10 sleep jobs from four users in a new pool that has four slots available in it. None of these users had accumulated any use history so all had identical EUPs of 0. Before I queued up the jobs, I shut down the negotiator with:

 

condor_off -negotiator

 

You can see the jobs ready to go:

 

-bash-3.2# condor_status -submitter

 

Name                 Machine      Running IdleJobs HeldJobs

 

alice@.internal      domU-12-31         0       10        0

bob@.internal        domU-12-31         0       10        0

eve@.internal        domU-12-31         0       10        0

test.user@.internal  domU-12-31         0       10        0

                           RunningJobs           IdleJobs           HeldJobs

 

     alice@.internal                 0                 10                  0

       bob@.internal                 0                 10                  0

       eve@.internal                 0                 10                  0

 test.user@.internal                 0                 10                  0

 

               Total                 0                 40                  0

 

I turned on the negotiator for one negotiation cycle and got one job from each user assigned to each of the four slots in my pool:

 

-bash-3.2# condor_q -const 'jobstatus == 2'

 

 

-- Submitter: Q1@domU-12-31-38-04-9C-A1 : <10.220.159.79:59831> : domU-12-31-38-04-9C-A1.compute-1.internal

 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               

   2.0   test.user       8/22 11:12   0+00:05:44 R  0   0.0  sleeper.py --min=6

   3.0   alice           8/22 11:14   0+00:05:47 R  0   0.0  sleeper.py --min=6

   4.0   bob             8/22 11:14   0+00:05:47 R  0   0.0  sleeper.py --min=6

   5.0   eve             8/22 11:14   0+00:05:45 R  0   0.0  sleeper.py --min=6

 

08/22/12 11:15:23 ---------- Started Negotiation Cycle ----------

08/22/12 11:15:23 Phase 1:  Obtaining ads from collector ...

08/22/12 11:15:23   Getting all public ads ...

08/22/12 11:15:24   Sorting 17 ads ...

08/22/12 11:15:24   Getting startd private ads ...

08/22/12 11:15:24 Got ads: 17 public and 4 private

08/22/12 11:15:24 Public ads include 4 submitter, 4 startd

08/22/12 11:15:24 Phase 2:  Performing accounting ...

08/22/12 11:15:24 Phase 3:  Sorting submitter ads by priority ...

08/22/12 11:15:24 Phase 4.1:  Negotiating with schedds ...

08/22/12 11:15:24   Negotiating with alice@.internal at <10.220.159.79:59831>

08/22/12 11:15:24 0 seconds so far

08/22/12 11:15:24     Request 00003.00000:

08/22/12 11:15:24       Matched 3.0 alice@.internal <10.220.159.79:59831> preempting none <10.123.7.99:57106> ip-10-123-7-99.ec2.internal

08/22/12 11:15:24       Successfully matched with ip-10-123-7-99.ec2.internal

08/22/12 11:15:24     Request 00003.00001:

08/22/12 11:15:24       Rejected 3.1 alice@.internal <10.220.159.79:59831>: fair share exceeded

08/22/12 11:15:24     Got NO_MORE_JOBS;  done negotiating

08/22/12 11:15:24   Negotiating with bob@.internal at <10.220.159.79:59831>

08/22/12 11:15:24 0 seconds so far

08/22/12 11:15:24     Request 00004.00000:

08/22/12 11:15:24       Matched 4.0 bob@.internal <10.220.159.79:59831> preempting none <10.93.21.85:53716> ip-10-93-21-85.ec2.internal

08/22/12 11:15:24       Successfully matched with ip-10-93-21-85.ec2.internal

08/22/12 11:15:24     Request 00004.00001:

08/22/12 11:15:24       Rejected 4.1 bob@.internal <10.220.159.79:59831>: fair share exceeded

08/22/12 11:15:25     Got NO_MORE_JOBS;  done negotiating

08/22/12 11:15:25   Negotiating with eve@.internal at <10.220.159.79:59831>

08/22/12 11:15:25 0 seconds so far

08/22/12 11:15:25     Request 00005.00000:

08/22/12 11:15:25       Matched 5.0 eve@.internal <10.220.159.79:59831> preempting none <10.127.163.251:50135> ip-10-127-163-251.ec2.internal

08/22/12 11:15:25       Successfully matched with ip-10-127-163-251.ec2.internal

08/22/12 11:15:25     Request 00005.00001:

08/22/12 11:15:25       Rejected 5.1 eve@.internal <10.220.159.79:59831>: fair share exceeded

08/22/12 11:15:25     Got NO_MORE_JOBS;  done negotiating

08/22/12 11:15:25   Negotiating with test.user@.internal at <10.220.159.79:59831>

08/22/12 11:15:25 0 seconds so far

08/22/12 11:15:25     Request 00002.00000:

08/22/12 11:15:25       Matched 2.0 test.user@.internal <10.220.159.79:59831> preempting none <10.220.109.195:45947> domU-12-31-38-04-6E-39.compute-1.internal

08/22/12 11:15:25       Successfully matched with domU-12-31-38-04-6E-39.compute-1.internal

08/22/12 11:15:25     Reached submitter resource limit: 1.000000 ... stopping

08/22/12 11:15:25  negotiateWithGroup resources used scheddAds length 4 

08/22/12 11:15:25 ---------- Finished Negotiation Cycle ----------

 

Condor determines the fairshare allotments at the outset of the negotiation cycle so it stopped after each user got one machine -- their fair share.

6. case #2: All 4 users, submit exactly M jobs each (where M > 50). The expected behavior after the first negotiation cycle, is that each user will get 50 machines (1 per job), and will still have (M-50) jobs pending in the queue.

Yes. This is what will happen. Again, assuming their EUPs are all equal. 

If I understood correctly,  from section 3.4.5 (Negotiation) of the Condor version 7.8.1 manual, then the negotiation algorithm will attempt to fulfill the first submitter's full job list, prior to skipping to the next submitter and getting its job list. So in my case#2 above, user A (if it's the first submitter) will get M machines allocated, leaving either B,C or D with less than 50 machines, as I expect.

No, that's not what happens. The negotiator determines up front, using the EUP of each submitter, what each submitter's fair share of the machines should be for this negotiation cycle. And based on that it moves through each submitter's list of idle jobs and tries to match them to slots.

 

If the EUPs of your users aren't all identical then the allocations will not be equal. Some users will get more because they've used less in the recent past. Some users will get less because they've used more in the recent past.

So to summarize my questions are:

1. Will my expected behavior for both case #1 and case#2 above, indeed occur, under my  assumptions ?

Only if you also add the assumption that all EUPs are identical for the users.

2. How does setting Hierarchical Group Quotas (as in section 3.4.8), affect the negotiation flow ? does it change the Pie Slice value ?

Accounting groups help ensure that, regardless of EUP, people get some minimum (and possibly maximum) number of slots in your pool when they have jobs in a queue.

 

If you wanted each user to always get 50 machines, but user >50 machines if other users aren't using their machines, you'd setup soft quotas for 4 different groups and put each user in a unique group. Now, Condor will attempt to fulfill their quotas first and, once all the quotas have been satisfied, it'll let excess free resources be used, fair share, but anyone who has a soft quota limit.

 

Regards,

- Ian

 

---

Ian Chesal

 

Cycle Computing, LLC

Leader in Open Compute Solutions for Clouds, Servers, and Desktops

Enterprise Condor Support and Management Tools