Re: [Condor-users] Fair Resource Allocation question

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

Hi,

I am finally in a position to verify the allocations kindly described by Ian Chesal on 22-aug (see thread below).

I am trying to verify that a fair share of resources is indeed matches to 2 users with same user priority, both with job numbers of same job priority

I expect that a limited set of equal resources will be shared 50-50 between the users. The actual result I’m seeing is that one user still gets all its requests filled.

background

The machine I have has 16 cpus’ thus giving me 16 slots. I have assigned a STARTD_ATTR to just 8 of the slots: MT_Model = "PowerEdge 1850". The other 8 slots have different values for this attr.

MY Test flow:

1. 2 users are assigned the same user priority

2. First user submits a cluster job with ‘queue 5’ and job requirements is: TARGET.MT_Model =?= "PowerEdge 1850"

3. Second user submits a cluster job with ‘queue 6’ and same job requirements is: TARGET.MT_Model =?= "PowerEdge 1850"

4. I expect that each will get exactly 4 slots and have remaining jobs un-matched

5. Actually what I see is that the second user gets all its requested 6 slots and the other gets 2, with 3 jobs un-matched

I would like to understand what I’m doing wrong here…..

Test Flow Commands:

Users priority is set to equal value:

>condor_userprio -setprio eitan@xxxxxxxxxxx 3

The priority of eitan@xxxxxxxxxxx was set to 3.000000

>condor_userprio -setprio leader@xxxxxxxxxxx 3

The priority of leader@xxxxxxxxxxx was set to 3.000000

> condor_userprio -all

Last Priority Update: 9/20 19:52

Effective Real Priority Res Total Usage Usage Last Time Since

User Name Priority Priority Factor In Use (wghted-hrs) Start Time Usage Time Last Usage

------------------ ------------ -------- --------- ------ ------------ ---------------- ---------------- ----------

eitan@xxxxxxxxxxx 3.00 3.00 1.00 0 17.06 8/23/2012 10:28 9/20/2012 19:52 <now>

leader@xxxxxxxxxxx 3.00 3.00 1.00 0 28.20 8/23/2012 10:28 9/20/2012 19:52 <now>

------------------ ------------ -------- --------- ------ ------------ ---------------- ---------------- ----------

Number of users: 2 0 45.26 9/19/2012 19:53 0+23:59

I turn OFF the negotiator

>condor_off -negotiator

Sent "Kill-Daemon" command for "negotiator" to local master

User eitan then executes:

>condor_submit eitan:ENG:hmake_or_jk:eitan:5:3.simul.cmd

Submitting job(s).....

5 job(s) submitted to cluster 282.

>condor_prio -p 17 282

User leader then executes:

>condor_submit aviram:SW:umake:leader:6:3.simul.cmd

Submitting job(s)......

6 job(s) submitted to cluster 283.

>condor_prio -p 17 283

The resulting pending queue is as expected:

>condor_q

-- Submitter: MTLSLURM02.yok.mtl.com : <10.0.3.124:57920> : MTLSLURM02.yok.mtl.com

ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD

282.0 eitan 9/20 19:16 0+00:00:00 I 17 0.0 eitan:ENG:hmake_or

282.1 eitan 9/20 19:16 0+00:00:00 I 17 0.0 eitan:ENG:hmake_or

282.2 eitan 9/20 19:16 0+00:00:00 I 17 0.0 eitan:ENG:hmake_or

282.3 eitan 9/20 19:16 0+00:00:00 I 17 0.0 eitan:ENG:hmake_or

282.4 eitan 9/20 19:16 0+00:00:00 I 17 0.0 eitan:ENG:hmake_or

283.0 leader 9/20 19:17 0+00:00:00 I 17 0.0 aviram:SW:umake:le

283.1 leader 9/20 19:17 0+00:00:00 I 17 0.0 aviram:SW:umake:le

283.2 leader 9/20 19:17 0+00:00:00 I 17 0.0 aviram:SW:umake:le

283.3 leader 9/20 19:17 0+00:00:00 I 17 0.0 aviram:SW:umake:le

283.4 leader 9/20 19:17 0+00:00:00 I 17 0.0 aviram:SW:umake:le

283.5 leader 9/20 19:17 0+00:00:00 I 17 0.0 aviram:SW:umake:le

11 jobs; 0 completed, 0 removed, 11 idle, 0 running, 0 held, 0 suspended

I turn on the negotiator and see the results

>condor_on -negotiator

Sent "Spawn-Daemon" command for "negotiator" to local master

>condor_q

-- Submitter: MTLSLURM02.yok.mtl.com : <10.0.3.124:57920> : MTLSLURM02.yok.mtl.com

ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD

282.0 eitan 9/20 19:16 0+00:00:01 R 17 0.0 eitan:ENG:hmake_or

282.1 eitan 9/20 19:16 0+00:00:01 R 17 0.0 eitan:ENG:hmake_or

282.2 eitan 9/20 19:16 0+00:00:00 I 17 0.0 eitan:ENG:hmake_or

282.3 eitan 9/20 19:16 0+00:00:00 I 17 0.0 eitan:ENG:hmake_or

282.4 eitan 9/20 19:16 0+00:00:00 I 17 0.0 eitan:ENG:hmake_or

283.0 leader 9/20 19:17 0+00:00:02 R 17 0.0 aviram:SW:umake:le

283.1 leader 9/20 19:17 0+00:00:02 R 17 0.0 aviram:SW:umake:le

283.2 leader 9/20 19:17 0+00:00:02 R 17 0.0 aviram:SW:umake:le

283.3 leader 9/20 19:17 0+00:00:01 R 17 0.0 aviram:SW:umake:le

283.4 leader 9/20 19:17 0+00:00:01 R 17 0.0 aviram:SW:umake:le

283.5 leader 9/20 19:17 0+00:00:01 R 17 0.0 aviram:SW:umake:le

11 jobs; 0 completed, 0 removed, 3 idle, 8 running, 0 held, 0 suspended

# As you can see, user leader, got 6 matches “R” while user eitan got the remaining 2

>condor_q –anal

-- Submitter: MTLSLURM02.yok.mtl.com : <10.0.3.124:57920> : MTLSLURM02.yok.mtl.com

---

282.000: Request is being serviced

---

282.001: Request is being serviced

---

282.002: Run analysis summary. Of 16 machines,

8 are rejected by your job's requirements

0 reject your job because of their own requirements

8 match but are serving users with a better priority in the pool

0 match but reject the job for unknown reasons

0 match but will not currently preempt their existing job

0 match but are currently offline

0 are available to run your job

No successful match recorded.

Last failed match: Thu Sep 20 19:35:58 2012

Reason for last match failure: insufficient priority

The Requirements _expression_ for your job is:

( ( TARGET.MT_Model is "PowerEdge 1850" ) &&

( target.MT_SimulTime >= 1373881200 ) && ( target.MT_SimulTime <= 1376047080 ) &&

( target.MT_SimulMachineState == "unclaimed" ) ) && ( TARGET.Arch == "X86_64" ) &&

( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) &&

( TARGET.Memory >= RequestMemory ) &&

( ( TARGET.HasFileTransfer ) || ( TARGET.FileSystemDomain == MY.FileSystemDomain ) )

Condition Machines Matched Suggestion

--------- ---------------- ----------

1 ( TARGET.MT_Model is "PowerEdge 1850" )8

2 ( target.MT_SimulTime >= 1373881200 )16

3 ( target.MT_SimulTime <= 1376047080 )16

4 ( target.MT_SimulMachineState == "unclaimed" )

5 ( TARGET.Arch == "X86_64" ) 16

6 ( TARGET.OpSys == "LINUX" ) 16

7 ( TARGET.Disk >= 1 ) 16

8 ( TARGET.Memory >= ifthenelse(MemoryUsage isnt undefined,MemoryUsage,1) )

9 ( ( TARGET.HasFileTransfer ) || ( TARGET.FileSystemDomain == "yok.mtl.com" ) )

---

282.003: Run analysis summary. Of 16 machines,

8 are rejected by your job's requirements

0 reject your job because of their own requirements

8 match but are serving users with a better priority in the pool

0 match but reject the job for unknown reasons

0 match but will not currently preempt their existing job

0 match but are currently offline

0 are available to run your job

The Requirements _expression_ for your job is:

( ( TARGET.MT_Model is "PowerEdge 1850" ) &&

( target.MT_SimulTime >= 1373881200 ) && ( target.MT_SimulTime <= 1376047080 ) &&

( target.MT_SimulMachineState == "unclaimed" ) ) && ( TARGET.Arch == "X86_64" ) &&

( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) &&

( TARGET.Memory >= RequestMemory ) &&

( ( TARGET.HasFileTransfer ) || ( TARGET.FileSystemDomain == MY.FileSystemDomain ) )

Condition Machines Matched Suggestion

--------- ---------------- ----------

1 ( .RIGHT.MT_Model is "PowerEdge 1850" )8

2 ( .RIGHT.MT_SimulTime >= 1373881200 )16

3 ( .RIGHT.MT_SimulTime <= 1376047080 )16

4 ( .RIGHT.MT_SimulMachineState == "unclaimed" )

5 ( .RIGHT.Arch == "X86_64" ) 16

6 ( .RIGHT.OpSys == "LINUX" ) 16

7 ( .RIGHT.Disk >= 1 ) 16

8 ( .RIGHT.Memory >= ifthenelse(MemoryUsage isnt undefined,MemoryUsage,1) )

9 ( ( .RIGHT.HasFileTransfer ) || ( .RIGHT.FileSystemDomain == "yok.mtl.com" ) )

---

282.004: Run analysis summary. Of 16 machines,

8 are rejected by your job's requirements

0 reject your job because of their own requirements

8 match but are serving users with a better priority in the pool

0 match but reject the job for unknown reasons

0 match but will not currently preempt their existing job

0 match but are currently offline

0 are available to run your job

The Requirements _expression_ for your job is:

( ( TARGET.MT_Model is "PowerEdge 1850" ) &&

( target.MT_SimulTime >= 1373881200 ) && ( target.MT_SimulTime <= 1376047080 ) &&

( target.MT_SimulMachineState == "unclaimed" ) ) && ( TARGET.Arch == "X86_64" ) &&

( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) &&

( TARGET.Memory >= RequestMemory ) &&

( ( TARGET.HasFileTransfer ) || ( TARGET.FileSystemDomain == MY.FileSystemDomain ) )

Condition Machines Matched Suggestion

--------- ---------------- ----------

1 ( .RIGHT.MT_Model is "PowerEdge 1850" )8

2 ( .RIGHT.MT_SimulTime >= 1373881200 )16

3 ( .RIGHT.MT_SimulTime <= 1376047080 )16

4 ( .RIGHT.MT_SimulMachineState == "unclaimed" )

5 ( .RIGHT.Arch == "X86_64" ) 16

6 ( .RIGHT.OpSys == "LINUX" ) 16

7 ( .RIGHT.Disk >= 1 ) 16

8 ( .RIGHT.Memory >= ifthenelse(MemoryUsage isnt undefined,MemoryUsage,1) )

9 ( ( .RIGHT.HasFileTransfer ) || ( .RIGHT.FileSystemDomain == "yok.mtl.com" ) )

---

283.000: Request is being serviced

---

283.001: Request is being serviced

---

283.002: Request is being serviced

---

283.003: Request is being serviced

---

283.004: Request is being serviced

---

283.005: Request is being serviced

<<END OF MY TEST FLOW>>

Yuval Leader

Design Automation Engineer, Mellanox Technologies

mailto: leader@xxxxxxxxxxxx

Tel: +972-74-7236360 Fax: +972-4-9593245

Beit Mellanox. 6th Floor,R-620

P.O.Box 586, Yokneam Industrial Park, 20692 Israel

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: Wednesday, August 22, 2012 6:42 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Fair Resource Allocation question

Hi Yuval,

On Wednesday, 22 August, 2012 at 7:35 AM, Yuval Leader wrote:

As a new user, I would like to understand how Condor will handle my expected allocation scenarios below.

1. There are 4 users, A, B, C, D.

2. Each user owns 50 machines and they all agree to pool them together into a 200 machine condor pool.

3. For simplicity, all machines are the same and all jobs have the same requirements all the time, and all users have the same User priority.

4. Assume also that preemption is disabled.

5. case #1: All 4 users, submit exactly N jobs each (where N is lower or equal to 50). The expected behavior after the first negotiation cycle, is that each user will get exactly N machines (1 per job).

Add another assumption and yes, this is what you'll see. The other assumption you need to add is that all four users have the exact same effective user priority. See: http://research.cs.wisc.edu/condor/manual/v7.6/3_4User_Priorities.html#25902

If they all have the same EUP, they'll all get exactly 1/4 of the system after one negotiation cycle assuming everything about their jobs is equal.

This is easy enough to test.

I queued up 10 sleep jobs from four users in a new pool that has four slots available in it. None of these users had accumulated any use history so all had identical EUPs of 0. Before I queued up the jobs, I shut down the negotiator with:

condor_off -negotiator

You can see the jobs ready to go:

-bash-3.2# condor_status -submitter

Name Machine Running IdleJobs HeldJobs

alice@.internal domU-12-31 0 10 0

bob@.internal domU-12-31 0 10 0

eve@.internal domU-12-31 0 10 0

test.user@.internal domU-12-31 0 10 0

RunningJobs IdleJobs HeldJobs

alice@.internal 0 10 0

bob@.internal 0 10 0

eve@.internal 0 10 0

test.user@.internal 0 10 0

Total 0 40 0

I turned on the negotiator for one negotiation cycle and got one job from each user assigned to each of the four slots in my pool:

-bash-3.2# condor_q -const 'jobstatus == 2'

-- Submitter: Q1@domU-12-31-38-04-9C-A1 : <10.220.159.79:59831> : domU-12-31-38-04-9C-A1.compute-1.internal

ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD

2.0 test.user 8/22 11:12 0+00:05:44 R 0 0.0 sleeper.py --min=6

3.0 alice 8/22 11:14 0+00:05:47 R 0 0.0 sleeper.py --min=6

4.0 bob 8/22 11:14 0+00:05:47 R 0 0.0 sleeper.py --min=6

5.0 eve 8/22 11:14 0+00:05:45 R 0 0.0 sleeper.py --min=6

08/22/12 11:15:23 ---------- Started Negotiation Cycle ----------

08/22/12 11:15:23 Phase 1: Obtaining ads from collector ...

08/22/12 11:15:23 Getting all public ads ...

08/22/12 11:15:24 Sorting 17 ads ...

08/22/12 11:15:24 Getting startd private ads ...

08/22/12 11:15:24 Got ads: 17 public and 4 private

08/22/12 11:15:24 Public ads include 4 submitter, 4 startd

08/22/12 11:15:24 Phase 2: Performing accounting ...

08/22/12 11:15:24 Phase 3: Sorting submitter ads by priority ...

08/22/12 11:15:24 Phase 4.1: Negotiating with schedds ...

08/22/12 11:15:24 Negotiating with alice@.internal at <10.220.159.79:59831>

08/22/12 11:15:24 0 seconds so far

08/22/12 11:15:24 Request 00003.00000:

08/22/12 11:15:24 Matched 3.0 alice@.internal <10.220.159.79:59831> preempting none <10.123.7.99:57106> ip-10-123-7-99.ec2.internal

08/22/12 11:15:24 Successfully matched with ip-10-123-7-99.ec2.internal

08/22/12 11:15:24 Request 00003.00001:

08/22/12 11:15:24 Rejected 3.1 alice@.internal <10.220.159.79:59831>: fair share exceeded

08/22/12 11:15:24 Got NO_MORE_JOBS; done negotiating

08/22/12 11:15:24 Negotiating with bob@.internal at <10.220.159.79:59831>

08/22/12 11:15:24 0 seconds so far

08/22/12 11:15:24 Request 00004.00000:

08/22/12 11:15:24 Matched 4.0 bob@.internal <10.220.159.79:59831> preempting none <10.93.21.85:53716> ip-10-93-21-85.ec2.internal

08/22/12 11:15:24 Successfully matched with ip-10-93-21-85.ec2.internal

08/22/12 11:15:24 Request 00004.00001:

08/22/12 11:15:24 Rejected 4.1 bob@.internal <10.220.159.79:59831>: fair share exceeded

08/22/12 11:15:25 Got NO_MORE_JOBS; done negotiating

08/22/12 11:15:25 Negotiating with eve@.internal at <10.220.159.79:59831>

08/22/12 11:15:25 0 seconds so far

08/22/12 11:15:25 Request 00005.00000:

08/22/12 11:15:25 Matched 5.0 eve@.internal <10.220.159.79:59831> preempting none <10.127.163.251:50135> ip-10-127-163-251.ec2.internal

08/22/12 11:15:25 Successfully matched with ip-10-127-163-251.ec2.internal

08/22/12 11:15:25 Request 00005.00001:

08/22/12 11:15:25 Rejected 5.1 eve@.internal <10.220.159.79:59831>: fair share exceeded

08/22/12 11:15:25 Got NO_MORE_JOBS; done negotiating

08/22/12 11:15:25 Negotiating with test.user@.internal at <10.220.159.79:59831>

08/22/12 11:15:25 0 seconds so far

08/22/12 11:15:25 Request 00002.00000:

08/22/12 11:15:25 Matched 2.0 test.user@.internal <10.220.159.79:59831> preempting none <10.220.109.195:45947> domU-12-31-38-04-6E-39.compute-1.internal

08/22/12 11:15:25 Successfully matched with domU-12-31-38-04-6E-39.compute-1.internal

08/22/12 11:15:25 Reached submitter resource limit: 1.000000 ... stopping

08/22/12 11:15:25 negotiateWithGroup resources used scheddAds length 4

08/22/12 11:15:25 ---------- Finished Negotiation Cycle ----------

Condor determines the fairshare allotments at the outset of the negotiation cycle so it stopped after each user got one machine -- their fair share.

6. case #2: All 4 users, submit exactly M jobs each (where M > 50). The expected behavior after the first negotiation cycle, is that each user will get 50 machines (1 per job), and will still have (M-50) jobs pending in the queue.

Yes. This is what will happen. Again, assuming their EUPs are all equal.

If I understood correctly, from section 3.4.5 (Negotiation) of the Condor version 7.8.1 manual, then the negotiation algorithm will attempt to fulfill the first submitter's full job list, prior to skipping to the next submitter and getting its job list. So in my case#2 above, user A (if it's the first submitter) will get M machines allocated, leaving either B,C or D with less than 50 machines, as I expect.

No, that's not what happens. The negotiator determines up front, using the EUP of each submitter, what each submitter's fair share of the machines should be for this negotiation cycle. And based on that it moves through each submitter's list of idle jobs and tries to match them to slots.

If the EUPs of your users aren't all identical then the allocations will not be equal. Some users will get more because they've used less in the recent past. Some users will get less because they've used more in the recent past.

So to summarize my questions are:

1. Will my expected behavior for both case #1 and case#2 above, indeed occur, under my assumptions ?

Only if you also add the assumption that all EUPs are identical for the users.

2. How does setting Hierarchical Group Quotas (as in section 3.4.8), affect the negotiation flow ? does it change the Pie Slice value ?

Accounting groups help ensure that, regardless of EUP, people get some minimum (and possibly maximum) number of slots in your pool when they have jobs in a queue.

If you wanted each user to always get 50 machines, but user >50 machines if other users aren't using their machines, you'd setup soft quotas for 4 different groups and put each user in a unique group. Now, Condor will attempt to fulfill their quotas first and, once all the quotas have been satisfied, it'll let excess free resources be used, fair share, but anyone who has a soft quota limit.

Regards,

- Ian

---

Ian Chesal

Cycle Computing, LLC

Leader in Open Compute Solutions for Clouds, Servers, and Desktops

Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com

http://www.cyclecloud.com

http://twitter.com/cyclecomputing

Mailing List Archives

Public Access

Re: [Condor-users] Fair Resource Allocation question