[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor negotiator ignoring accounting groups?



Hi,

I'm using condor 7.6.0-1 on a cluster with 4252 job slots. I have the
following accounting groups configured:
GROUP_NAMES = group_opport, group_atlasprod, group_dzero,
group_atlasinstall, group_atlasanaly
GROUP_QUOTA_group_dzero = 20
GROUP_QUOTA_group_opport = 200
GROUP_QUOTA_group_atlasprod = 1650
GROUP_QUOTA_group_atlasinstall = 1
GROUP_QUOTA_group_atlasanaly = 2477
GROUP_PRIO_FACTOR_group_opport = 500
GROUP_PRIO_FACTOR_group_atlasprod = 10
GROUP_PRIO_FACTOR_group_atlasinstall = 99
GROUP_PRIO_FACTOR_group_atlasanaly = 9
GROUP_PRIO_FACTOR_group_dzero = 1000
GROUP_ACCEPT_SURPLUS = TRUE
GROUP_AUTOREGROUP_group_dzero = FALSE
GROUP_AUTOREGROUP_group_opport = FALSE

We're currently running ~3600 jobs in the atlasprod group, and ~470 jobs
in the atlasanaly group.  I would expect that the next jobs to run would
be atlasanaly. However, I find that the negotiator is instead starting
atlasprod jobs.  I attached an excerpt from the Negotiator log for one
such cycle.  The four jobs that were started that cycle were all atlasprod.

Any ideas why that's happening? It seems like from the log that is
completely ignoring the accounting group, even though I can see that the
accounting group is set in the job classad.  I am seeing a difference
that the group_opport attribute is quoted, while the atlas ones are not:
AccountingGroup = group_atlasprod.usatlas1
AccountingGroup = "group_opport.engage"


--Sarah
01/12/12 13:15:19 ---------- Started Negotiation Cycle ----------
01/12/12 13:15:19 Phase 1:  Obtaining ads from collector ...
01/12/12 13:15:19   Getting all public ads ...
01/12/12 13:15:23   Sorting 4721 ads ...
01/12/12 13:15:23   Getting startd private ads ...
01/12/12 13:15:24 Got ads: 4721 public and 4261 private
01/12/12 13:15:24 Public ads include 4 submitter, 4261 startd
01/12/12 13:15:24 Phase 2:  Performing accounting ...
01/12/12 13:15:25 GROUP_DYNAMIC_MACH_CONSTRAINT constraint reduces machine count from 4261 to 4252
01/12/12 13:15:25 group quotas: assigning 4 submitters to accounting groups
01/12/12 13:15:25 group quotas: assigning group quotas from 4252 available slots01/12/12 13:15:25 group quotas: WARNING: static quota for group group_atlasanaly rescaled from 2477 to 2422.31
01/12/12 13:15:25 group quotas: WARNING: static quota for group group_atlasinstall rescaled from 1 to 0.977921
01/12/12 13:15:25 group quotas: WARNING: static quota for group group_atlasprod rescaled from 1650 to 1613.57
01/12/12 13:15:25 group quotas: WARNING: static quota for group group_dzero rescaled from 20 to 19.5584
01/12/12 13:15:25 group quotas: WARNING: static quota for group group_opport rescaled from 200 to 195.584
01/12/12 13:15:25 group quotas: allocation round 1
01/12/12 13:15:25 group quotas: groups= 6  requesting= 2  served= 2  unserved= 0  slots= 4252  requested= 6154  allocated= 4251  surplus= 
1
01/12/12 13:15:25 Group <none> - BEGIN NEGOTIATION
01/12/12 13:15:25 Phase 3:  Sorting submitter ads by priority ...
01/12/12 13:15:25 Phase 4.1:  Negotiating with schedds ...
01/12/12 13:15:25   Negotiating with usatlas1@xxxxxxxxxxxxxxx at <10.1.5.146:33614>
01/12/12 13:15:25 0 seconds so far
01/12/12 13:15:25     Request 3979396.00000:
01/12/12 13:15:26       Matched 3979396.0 usatlas1@xxxxxxxxxxxxxxx <10.1.5.146:33614> preempting none <149.165.225.152:35051> slot10@iut2-
c152.iu.edu
01/12/12 13:15:26       Successfully matched with slot10@xxxxxxxxxxxxxxxx
01/12/12 13:15:26     Request 3979397.00000:
01/12/12 13:15:26       Matched 3979397.0 usatlas1@xxxxxxxxxxxxxxx <10.1.5.146:33614> preempting none <10.1.4.26:55609> slot23@xxxxxxxxxxx
wt2.org
01/12/12 13:15:26       Successfully matched with slot23@xxxxxxxxxxxxxxxxxx
01/12/12 13:15:26     Request 3979398.00000:
01/12/12 13:15:26       Matched 3979398.0 usatlas1@xxxxxxxxxxxxxxx <10.1.5.146:33614> preempting none <10.1.2.196:33965> slot5@xxxxxxxxxxx
wt2.org
01/12/12 13:15:26       Successfully matched with slot5@xxxxxxxxxxxxxxxxxx
01/12/12 13:15:26     Request 3979399.00000:
01/12/12 13:15:26       Matched 3979399.0 usatlas1@xxxxxxxxxxxxxxx <10.1.5.146:33614> preempting none <10.1.2.48:46915> slot4@xxxxxxxxxxxxxxxxxx
01/12/12 13:15:26       Successfully matched with slot4@xxxxxxxxxxxxxxxxxx
01/12/12 13:15:26     Request 3979400.00000:
01/12/12 13:15:26       Rejected 3979400.0 usatlas1@xxxxxxxxxxxxxxx <10.1.5.146:33614>: insufficient priority
01/12/12 13:15:26     Request 3979444.00000:
01/12/12 13:15:26       Rejected 3979444.0 usatlas1@xxxxxxxxxxxxxxx <10.1.5.146:33614>: insufficient priority
01/12/12 13:15:26     Request 3979626.00000:
01/12/12 13:15:27       Rejected 3979626.0 usatlas1@xxxxxxxxxxxxxxx <10.1.5.146:33614>: insufficient priority
01/12/12 13:15:27     Request 3981309.00000:
01/12/12 13:15:27       Rejected 3981309.0 usatlas1@xxxxxxxxxxxxxxx <10.1.5.146:33614>: insufficient priority
01/12/12 13:15:28     Got NO_MORE_JOBS;  done negotiating
01/12/12 13:15:28 Phase 4.2:  Negotiating with schedds ...
01/12/12 13:15:28   Negotiating with usatlas1@xxxxxxxxxxxxxxx at <10.1.5.146:33614>
01/12/12 13:15:28 3 seconds so far
01/12/12 13:15:28     Request 3979400.00000:
01/12/12 13:15:28       Matched 3979400.0 usatlas1@xxxxxxxxxxxxxxx <10.1.5.146:33614> preempting none <149.165.225.118:37008> slot4@xxxxxxxxxxxxxxxx
01/12/12 13:15:28       Successfully matched with slot4@xxxxxxxxxxxxxxxx
01/12/12 13:15:28     Reached submitter resource limit: 1.000000 ... stopping
01/12/12 13:15:28 Phase 4.3:  Negotiating with schedds ...
01/12/12 13:15:28   Negotiating with usatlas1@xxxxxxxxxxxxxxx at <10.1.5.146:33614>
01/12/12 13:15:28 3 seconds so far
01/12/12 13:15:28  negotiateWithGroup resources used scheddAds length 1 
01/12/12 13:15:28 Group group_atlasanaly - skipping, zero slots allocated
01/12/12 13:15:28 Group group_atlasinstall - skipping, zero slots allocated
01/12/12 13:15:28 Group group_atlasprod - skipping, zero slots allocated
01/12/12 13:15:28 Group group_dzero - skipping, zero slots allocated
01/12/12 13:15:28 Group group_opport - BEGIN NEGOTIATION
01/12/12 13:15:28 Phase 3:  Sorting submitter ads by priority ...
01/12/12 13:15:28 Phase 4.1:  Negotiating with schedds ...
01/12/12 13:15:28   Negotiating with group_opport.engage@xxxxxxxxxxxxxxx at <10.1.5.146:33614>
01/12/12 13:15:28 0 seconds so far
01/12/12 13:15:28     Request 3979638.00000:
01/12/12 13:15:29       Rejected 3979638.0 group_opport.engage@xxxxxxxxxxxxxxx <10.1.5.146:33614>: no match found
01/12/12 13:15:29     Request 3981304.00000:
01/12/12 13:15:29       Rejected 3981304.0 group_opport.engage@xxxxxxxxxxxxxxx <10.1.5.146:33614>: no match found
01/12/12 13:15:29     Got NO_MORE_JOBS;  done negotiating
01/12/12 13:15:29 Phase 4.2:  Negotiating with schedds ...
01/12/12 13:15:29   Negotiating with group_opport.engage@xxxxxxxxxxxxxxx at <10.1.5.146:33614>
01/12/12 13:15:29 1 seconds so far
01/12/12 13:15:29  negotiateWithGroup resources used scheddAds length 1 
01/12/12 13:15:29 Round 1 totals: allocated= 4251  usage= 4251
01/12/12 13:15:29 ---------- Finished Negotiation Cycle ----------