[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Negotiation and group_quota issue



Thanks Greg,
Here is the requested grep with a few lines before It's from a single negotiation cycle .

First

67-11/16/23 09:11:51 group quotas: subtree groupe receiving quota= 3.312
68-11/16/23 09:11:51 group quotas: group groupe, allocated 0 for static children, 3.312 for dynamic children
69-11/16/23 09:11:51 group quotas: group groupe assigned quota= 3.312
70-11/16/23 09:11:51 group quotas: subtree groupf receiving quota= 3.312
71-11/16/23 09:11:51 group quotas: group groupf, allocated 0 for static children, 3.312 for dynamic children
72-11/16/23 09:11:51 group quotas: group groupf assigned quota= 3.312
73-11/16/23 09:11:51 group quotas: subtree groupg receiving quota= 0.84
74-11/16/23 09:11:51 group quotas: group groupg, allocated 0 for static children, 0.84 for dynamic children
75-11/16/23 09:11:51 group quotas: group groupg assigned quota= 0.84
76-11/16/23 09:11:51 group quotas: group <none> assigned quota= 0.684
77-11/16/23 09:11:51 group quota: Quotas have been assigned to the following groups
78-11/16/23 09:11:51 Group                Computed   Config    Quota      Use     Auto  Claimed Requestd SubmtersAllocatd
79-11/16/23 09:11:51 Name                    quota    quota   static  surplus  Regroup    cores    cores in group   cores
80-11/16/23 09:11:51 ----------------------------------------------------------------------------------------------------
81-11/16/23 09:11:51 <none>                  0.684        0        N        Y        N        0        0        0       0
82-11/16/23 09:11:51 groupa                    0.3    0.025        N        Y        N        0  4361.07        5       0
83-11/16/23 09:11:51 groupb                  3.312    0.276        N        Y        N        0  59867.1        5       0
84-11/16/23 09:11:51 groupc                   0.06    0.005        N        Y        N        0        0        0       0
85-11/16/23 09:11:51 guest                    0.06    0.005        N        Y        N        0        0        0       0
86-11/16/23 09:11:51 groupd                   0.12     0.01        N        Y        N        0        2        1       0
87-11/16/23 09:11:51 groupe                  3.312    0.276        N        Y        N        3  12000.8        7       0
88-11/16/23 09:11:51 groupf                  3.312    0.276        N        Y        N        0   259114       11       0
89-11/16/23 09:11:51 groupg                   0.84     0.07        N        Y        N        0        2        1       0
90-11/16/23 09:11:51 group quotas: allocation round 1
91-11/16/23 09:11:51 group quotas: fairshare (1): group= <none>  quota= 0.684  requested= 0
92-11/16/23 09:11:51 group quotas: fairshare (2): group= <none>  quota= 0.684  allocated= 0  requested= 0
93-11/16/23 09:11:51 group quotas: fairshare (1): group= groupa  quota= 0.3  requested= 4361.07
94-11/16/23 09:11:51 group quotas: fairshare (2): group= groupa  quota= 0.3  allocated= 0.3  requested= 4360.77
95-11/16/23 09:11:51 group quotas: fairshare (1): group= groupb  quota= 3.312  requested= 59867.1
96-11/16/23 09:11:51 group quotas: fairshare (2): group= groupb  quota= 3.312  allocated= 3.312  requested= 59863.8
97-11/16/23 09:11:51 group quotas: fairshare (1): group= groupc  quota= 0.06  requested= 0
98-11/16/23 09:11:51 group quotas: fairshare (2): group= groupc  quota= 0.06  allocated= 0  requested= 0
99-11/16/23 09:11:51 group quotas: fairshare (1): group= guest  quota= 0.06  requested= 0
100-11/16/23 09:11:51 group quotas: fairshare (2): group= guest  quota= 0.06  allocated= 0  requested= 0
101-11/16/23 09:11:51 group quotas: fairshare (1): group= groupd  quota= 0.12  requested= 2
102-11/16/23 09:11:51 group quotas: fairshare (2): group= groupd  quota= 0.12  allocated= 0.12  requested= 1.88
103-11/16/23 09:11:51 group quotas: fairshare (1): group= groupe  quota= 3.312  requested= 12000.8
104-11/16/23 09:11:51 group quotas: fairshare (2): group= groupe  quota= 3.312  allocated= 3.312  requested= 11997.4
105-11/16/23 09:11:51 group quotas: fairshare (1): group= groupf  quota= 3.312  requested= 259114
106-11/16/23 09:11:51 group quotas: fairshare (2): group= groupf  quota= 3.312  allocated= 3.312  requested= 259110
107-11/16/23 09:11:51 group quotas: fairshare (1): group= groupg  quota= 0.84  requested= 2
108-11/16/23 09:11:51 group quotas: fairshare (2): group= groupg  quota= 0.84  allocated= 0.84  requested= 1.16
109-11/16/23 09:11:51 group quotas: allocate-surplus (1): group= <none>  surplus= 0.804  subtree-requested= 335336
110-11/16/23 09:11:51 group quotas: allocate-surplus (2b): quota-based allocation, group= <none>  requested= 335336  surplus= 0.804
111-11/16/23 09:11:51 group quotas: allocate-surplus-loop: by_quota= 1  iteration= 1  requested= 335336  surplus= 0.804
112-11/16/23 09:11:51 group quotas: allocate-surplus (1): group= groupa  surplus= 0.0215434  subtree-requested= 4360.77
113-11/16/23 09:11:51 group quotas: allocate-surplus (2b): quota-based allocation, group= groupa  requested= 4360.77  surplus= 0.0215434
114-11/16/23 09:11:51 group quotas: allocate-surplus-loop: by_quota= 1  iteration= 1  requested= 4360.77  surplus= 0.0215434
115-11/16/23 09:11:51 group quotas: allocate-surplus (4): group groupa allocated surplus= 0.0215434  allocated= 0.321543  requested= 4360.75
116-11/16/23 09:11:51 group quotas: allocate-surplus (1): group= groupb  surplus= 0.237839  subtree-requested= 59863.8
117-11/16/23 09:11:51 group quotas: allocate-surplus (2b): quota-based allocation, group= groupb  requested= 59863.8  surplus= 0.237839
118-11/16/23 09:11:51 group quotas: allocate-surplus-loop: by_quota= 1  iteration= 1  requested= 59863.8  surplus= 0.237839
119-11/16/23 09:11:51 group quotas: allocate-surplus (4): group groupb allocated surplus= 0.237839  allocated= 3.54984  requested= 59863.6
120-11/16/23 09:11:51 group quotas: allocate-surplus (1): group= groupd  surplus= 0.00861736  subtree-requested= 1.88
121-11/16/23 09:11:51 group quotas: allocate-surplus (2b): quota-based allocation, group= groupd  requested= 1.88  surplus= 0.00861736
122-11/16/23 09:11:51 group quotas: allocate-surplus-loop: by_quota= 1  iteration= 1  requested= 1.88  surplus= 0.00861736
123-11/16/23 09:11:51 group quotas: allocate-surplus (4): group groupd allocated surplus= 0.00861736  allocated= 0.128617  requested= 1.87138
124-11/16/23 09:11:51 group quotas: allocate-surplus (1): group= groupe  surplus= 0.237839  subtree-requested= 11997.4
125-11/16/23 09:11:51 group quotas: allocate-surplus (2b): quota-based allocation, group= groupe  requested= 11997.4  surplus= 0.237839
126-11/16/23 09:11:51 group quotas: allocate-surplus-loop: by_quota= 1  iteration= 1  requested= 11997.4  surplus= 0.237839
127-11/16/23 09:11:51 group quotas: allocate-surplus (4): group groupe allocated surplus= 0.237839  allocated= 3.54984  requested= 11997.2
128-11/16/23 09:11:51 group quotas: allocate-surplus (1): group= groupf  surplus= 0.237839  subtree-requested= 259110
129-11/16/23 09:11:51 group quotas: allocate-surplus (2b): quota-based allocation, group= groupf  requested= 259110  surplus= 0.237839
130-11/16/23 09:11:51 group quotas: allocate-surplus-loop: by_quota= 1  iteration= 1  requested= 259110  surplus= 0.237839
131-11/16/23 09:11:51 group quotas: allocate-surplus (4): group groupf allocated surplus= 0.237839  allocated= 3.54984  requested= 259110
132-11/16/23 09:11:51 group quotas: allocate-surplus (1): group= groupg  surplus= 0.0603215  subtree-requested= 1.16
133-11/16/23 09:11:51 group quotas: allocate-surplus (2b): quota-based allocation, group= groupg  requested= 1.16  surplus= 0.0603215
134-11/16/23 09:11:51 group quotas: allocate-surplus-loop: by_quota= 1  iteration= 1  requested= 1.16  surplus= 0.0603215
135-11/16/23 09:11:51 group quotas: allocate-surplus (4): group groupg allocated surplus= 0.0603215  allocated= 0.900322  requested= 1.09968
136-11/16/23 09:11:51 group quotas: allocate-surplus (4): group <none> allocated surplus= 0  allocated= 0  requested= 0
137-11/16/23 09:11:51 group quotas: fairshare (3): group= <none>  surplus= 0  subtree_requested= 335335
138-11/16/23 09:11:51 group quotas: group= <none>  quota= 0.684  requested= 0  allocated= 0  unallocated= 0
139-11/16/23 09:11:51 group quotas: group= groupa  quota= 0.3  requested= 4361.07  allocated= 0.321543  unallocated= 4360.75
140-11/16/23 09:11:51 group quotas: group= groupb  quota= 3.312  requested= 59867.1  allocated= 3.54984  unallocated= 59863.6
141-11/16/23 09:11:51 group quotas: group= groupc  quota= 0.06  requested= 0  allocated= 0  unallocated= 0
142-11/16/23 09:11:51 group quotas: group= guest  quota= 0.06  requested= 0  allocated= 0  unallocated= 0
143-11/16/23 09:11:51 group quotas: group= groupd  quota= 0.12  requested= 2  allocated= 0.128617  unallocated= 1.87138
144-11/16/23 09:11:51 group quotas: group= groupe  quota= 3.312  requested= 12000.8  allocated= 3.54984  unallocated= 11997.2
145-11/16/23 09:11:51 group quotas: group= groupf  quota= 3.312  requested= 259114  allocated= 3.54984  unallocated= 259110
146-11/16/23 09:11:51 group quotas: group= groupg  quota= 0.84  requested= 2  allocated= 0.900322  unallocated= 1.09968
147:11/16/23 09:11:51 group quotas: groups= 9  requesting= 6  served= 6  unserved= 0  requested= 335347  allocated= 12  surplus= 0  maxdelta= 3.54984

Second

1219-11/16/23 09:11:52 0 seconds so far for this submitter
1220-11/16/23 09:11:52 0 seconds so far for this schedd
1221-11/16/23 09:11:52    maxAllowed= 0   groupQuota= 3   groupusage=  3
1222-11/16/23 09:11:52   Calculating submitter limit with the following parameters
1223-11/16/23 09:11:52     SubmitterPrio       = 1150078.000000
1224-11/16/23 09:11:52     SubmitterPrioFactor = 1000000.000000
1225-11/16/23 09:11:52     submitterShare      = 0.000310
1226-11/16/23 09:11:52     submitterAbsShare   = 0.000713
1227-11/16/23 09:11:52     submitterLimit    = 0.000000
1228-11/16/23 09:11:52     submitterCeiling remaining   = 2147483647
1229-11/16/23 09:11:52     submitterUsage    = 3.000000
1230-11/16/23 09:11:52 Using resource request list from cache.
1231-11/16/23 09:11:52 Socket to groupe.dudu@dudu-linux-vpc (<172.20.61.14:20123?addrs=172.20.61.14-20123&alias=dudu-linux-vpc&noUDP&sock=schedd_859552_c189>) already in cache, reusing
1232-11/16/23 09:11:52 Started NEGOTIATE with remote schedd; groupftocol version 1.
1233-11/16/23 09:11:52     Over submitter resource limit (0.000000, used 0.000000) ... only consider startd ranks
1234-11/16/23 09:11:52     Request 00041.00003: autocluster 10 (request count 1 of 13)
1235-11/16/23 09:11:52 matchmakingAlgorithm: limit 0.000000 used 0.000000 pieLeft 2.999069
1236-11/16/23 09:11:52 Concurrency Limit: lambda_xcache is 3.000000 of max 20000.000000
1237-11/16/23 09:11:52     Send END_NEGOTIATE to remote schedd
1238-11/16/23 09:11:52   This submitter hit its submitterLimit.
1239-11/16/23 09:11:52  resources used scheddUsed= 3.000000
1240-11/16/23 09:11:52 Group groupe is using its quota 3 - halting negotiation
1241-11/16/23 09:11:52  negotiateWithGroup resources used submitterAds length 1
1242-11/16/23 09:11:52 Group <none> - sortkey= 3.4e+38
1243-11/16/23 09:11:52 Group <none> - skipping, zero slots allocated
1244-11/16/23 09:11:52 group quotas: Group <none>  allocated= 0  usage= 0
1245-11/16/23 09:11:52 group quotas: Group groupa  allocated= 0.321543  usage= 0
1246-11/16/23 09:11:52 group quotas: Group groupa - resetting requested to 0 because not all the requested jobs matched to slots.
1247-11/16/23 09:11:52 group quotas: Group groupb  allocated= 3.54984  usage= 0
1248-11/16/23 09:11:52 group quotas: Group groupb - resetting requested to 0 because not all the requested jobs matched to slots.
1249-11/16/23 09:11:52 group quotas: Group groupc  allocated= 0  usage= 0
1250-11/16/23 09:11:52 group quotas: Group guest  allocated= 0  usage= 0
1251-11/16/23 09:11:52 group quotas: Group groupd  allocated= 0.128617  usage= 0
1252-11/16/23 09:11:52 group quotas: Group groupd - resetting requested to 0 because not all the requested jobs matched to slots.
1253-11/16/23 09:11:52 group quotas: Group groupe  allocated= 3.54984  usage= 3
1254-11/16/23 09:11:52 group quotas: Group groupe - resetting requested to 3 because not all the requested jobs matched to slots.
1255-11/16/23 09:11:52 group quotas: Group groupf  allocated= 3.54984  usage= 0
1256-11/16/23 09:11:52 group quotas: Group groupf - resetting requested to 0 because not all the requested jobs matched to slots.
1257-11/16/23 09:11:52 group quotas: Group groupg  allocated= 0.900322  usage= 0
1258-11/16/23 09:11:52 group quotas: Group groupg - resetting requested to 0 because not all the requested jobs matched to slots.
1259-11/16/23 09:11:52 Round 1 totals: allocated= 12  usage= 3
1260-11/16/23 09:11:52 group quotas: allocation round 2
1261-11/16/23 09:11:52 group quotas: fairshare (1): group= <none>  quota= 0.684  requested= 0
1262-11/16/23 09:11:52 group quotas: fairshare (2): group= <none>  quota= 0.684  allocated= 0  requested= 0
1263-11/16/23 09:11:52 group quotas: fairshare (1): group= groupa  quota= 0.3  requested= 0
1264-11/16/23 09:11:52 group quotas: fairshare (2): group= groupa  quota= 0.3  allocated= 0  requested= 0
1265-11/16/23 09:11:52 group quotas: fairshare (1): group= groupb  quota= 3.312  requested= 0
1266-11/16/23 09:11:52 group quotas: fairshare (2): group= groupb  quota= 3.312  allocated= 0  requested= 0
1267-11/16/23 09:11:52 group quotas: fairshare (1): group= groupc  quota= 0.06  requested= 0
1268-11/16/23 09:11:52 group quotas: fairshare (2): group= groupc  quota= 0.06  allocated= 0  requested= 0
1269-11/16/23 09:11:52 group quotas: fairshare (1): group= guest  quota= 0.06  requested= 0
1270-11/16/23 09:11:52 group quotas: fairshare (2): group= guest  quota= 0.06  allocated= 0  requested= 0
1271-11/16/23 09:11:52 group quotas: fairshare (1): group= groupd  quota= 0.12  requested= 0
1272-11/16/23 09:11:52 group quotas: fairshare (2): group= groupd  quota= 0.12  allocated= 0  requested= 0
1273-11/16/23 09:11:52 group quotas: fairshare (1): group= groupe  quota= 3.312  requested= 3
1274-11/16/23 09:11:52 group quotas: fairshare (2): group= groupe  quota= 3.312  allocated= 3  requested= 0
1275-11/16/23 09:11:52 group quotas: fairshare (1): group= groupf  quota= 3.312  requested= 0
1276-11/16/23 09:11:52 group quotas: fairshare (2): group= groupf  quota= 3.312  allocated= 0  requested= 0
1277-11/16/23 09:11:52 group quotas: fairshare (1): group= groupg  quota= 0.84  requested= 0
1278-11/16/23 09:11:52 group quotas: fairshare (2): group= groupg  quota= 0.84  allocated= 0  requested= 0
1279-11/16/23 09:11:52 group quotas: allocate-surplus (1): group= <none>  surplus= 9  subtree-requested= 0
1280-11/16/23 09:11:52 group quotas: fairshare (3): group= <none>  surplus= 9  subtree_requested= 0
1281-11/16/23 09:11:52 group quotas: group= <none>  quota= 0.684  requested= 0  allocated= 0  unallocated= 0
1282-11/16/23 09:11:52 group quotas: group= groupa  quota= 0.3  requested= 0  allocated= 0  unallocated= 0
1283-11/16/23 09:11:52 group quotas: group= groupb  quota= 3.312  requested= 0  allocated= 0  unallocated= 0
1284-11/16/23 09:11:52 group quotas: group= groupc  quota= 0.06  requested= 0  allocated= 0  unallocated= 0
1285-11/16/23 09:11:52 group quotas: group= guest  quota= 0.06  requested= 0  allocated= 0  unallocated= 0
1286-11/16/23 09:11:52 group quotas: group= groupd  quota= 0.12  requested= 0  allocated= 0  unallocated= 0
1287-11/16/23 09:11:52 group quotas: group= groupe  quota= 3.312  requested= 3  allocated= 3  unallocated= 0
1288-11/16/23 09:11:52 group quotas: group= groupf  quota= 3.312  requested= 0  allocated= 0  unallocated= 0
1289-11/16/23 09:11:52 group quotas: group= groupg  quota= 0.84  requested= 0  allocated= 0  unallocated= 0
1290:11/16/23 09:11:52 group quotas: groups= 9  requesting= 1  served= 1  unserved= 0  requested= 3  allocated= 3  surplus= 9  maxdelta= 0

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Greg Thain via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: 15 November 2023 23:56
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Cc: Greg Thain <gthain@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Negotiation and group_quota issue
 


Hi David:

I think I may see what is going on.  Would it be possible to send the line  before this from the NegotiatorLog with the line that begins:

group quotas: groups=

(if sensitive, I don't need the names of the groups, but I'm particularly interested in the maxdelta value.


-greg



--------------------Sent  many jobs 3 are running expected 12.
10:37:06 Group physics - BEGIN NEGOTIATION with a quota limit of 3.54984
10:37:06 Group physics is using its quota 3 - halting negotiation
10:37:06 Group physics - BEGIN NEGOTIATION with a quota limit of -nan
10:37:14 Group physics - BEGIN NEGOTIATION with a quota limit of 3.54984
10:37:14 Group physics is using its quota 3 - halting negotiation
10:37:14 Group physics - BEGIN NEGOTIATION with a quota limit of -nan
10:37:23 Group physics - BEGIN NEGOTIATION with a quota limit of 3.54984
10:37:23 Group physics is using its quota 3 - halting negotiation

--------------------Restart condor on a single EP
10:37:23 Group physics - BEGIN NEGOTIATION with a quota limit of 12
10:37:23 Group physics - skipping, no submitters (usage=8)
10:37:30 Group physics - BEGIN NEGOTIATION with a quota limit of 3.54984
10:37:30 Group physics is using its quota 3 - halting negotiation
10:37:30 Group physics - BEGIN NEGOTIATION with a quota limit of 12
10:37:30 Group physics is using its quota 12 - halting negotiation

This issue is not related to gpus.
I have seen this issue before on a large pool and it disappear.

Probably It's something with the configuration but I can think of something that will trigger that after few
Happens on 9 and 23 versions.

I will keep digging.

David

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/