[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Negotiation fails due to 'submitter limit exceeded'



Hi all,

I am trying to understand, why a number of jobs are not brokered to slots while nodes have sufficient resources free.

The pattern seems to be that for a some groups the 'submitter limit exceeded' - while other groups seem not to be affected [1]

I am trying to crowd out such 'successful' groups, that do not throw the limit-message, to somewhat 'push' the problematic groups. Thus, I reduced the 'successful' group quotas to 0.0 about 3h ago - but still these groups get slots brokered [2] to (AUTOREGROUP is false).

Maybe somebody has an idea for me, what might be the problem here?

Next step for me would be to mv the AccountingNew history away and reset the whole accounting.

Cheers,
  Thomas



[1]
> NegotiatorLog
...
04/15/20 11:47:32 Request 21669873.00000: autocluster 37556 (request count 1 of 22) 04/15/20 11:47:32 Rejected 21669873.0 group_ATLAS.atlasprd003@xxxxxxx <131.169.223.111:9620?addrs=131.169.223.111-9620&noUDP&sock=7077_cfd4_3>: submitter limit exceeded 04/15/20 11:47:32 Request 21670466.00000: autocluster 35021 (request count 1 of 6) 04/15/20 11:47:32 Rejected 21670466.0 group_ATLAS.atlasprd003@xxxxxxx <131.169.223.111:9620?addrs=131.169.223.111-9620&noUDP&sock=7077_cfd4_3>: submitter limit exceeded 04/15/20 11:47:32 Group group_ATLAS is using its quota 1942.86 - halting negotiation
04/15/20 11:47:32  negotiateWithGroup resources used submitterAds length 2
...

job
> condor_q -name grid-arcce1.desy.de -better-analyze 21670466.0
   ...
   Reason for last match failure: submitter limit exceeded


[2]
...
04/15/20 11:47:31 group quotas: WARNING: dynamic quota for group group_ATLAS rescaled from 0.33 to 0.111864 04/15/20 11:47:31 group quotas: WARNING: dynamic quota for group group_BELLE2 rescaled from 0 to 0 04/15/20 11:47:31 group quotas: WARNING: dynamic quota for group group_CMS rescaled from 0.38 to 0.128814
...

...
04/15/20 11:47:32 Request 23469598.00000: autocluster 31222 (request count 6 of 6) 04/15/20 11:47:32 Matched 23469598.0 group_BELLE2.belleprd004@xxxxxxx <131.169.223.110:9620?addrs=131.169.223.110-9620&noUDP&sock=1702258_fa8d_3> preempting none <131.169.161.55:9620?addrs=131.169.161.55-9620&noUDP&sock=10509_e0dd_3> slot1@xxxxxxxxxxxxxxxxx
04/15/20 11:47:32       Successfully matched with slot1@xxxxxxxxxxxxxxxxx
04/15/20 11:47:32  negotiateWithGroup resources used submitterAds length 0

[3]
Last Priority Update:  4/15 11:53
Group Config Use Effective Priority Res Total Usage Time Since Requested User Name Quota Surplus Priority Factor In Use (wghted-hrs) Last Usage Resources ------------------------ --------- ------- ------------ --------- ------ ------------ ---------- ---------- group_DESY 0.80 ByQuota 1000.00 0 26.25 <now> 0 desyusr000@xxxxxxx 500.15 1000.00 2 26.23 0+00:00 group_ILC 0.04 ByQuota 1000.00 0 760.04 0+00:57 0 group_OPS 0.90 ByQuota 1000.00 0 57.93 0+00:01 0 opsusr003@xxxxxxx 500.01 1000.00 1 14.18 0+00:02 group_ATLAS 0.33 ByQuota 1000.00 1953 266728.59 <now> 651 atlassgm000@xxxxxxx 500.17 1000.00 0 13.13 0+00:10 atlasprd005@xxxxxxx 15782.22 1000.00 24 2833.03 <now> atlasplt002@xxxxxxx 363207.00 1000.00 270 78882.20 <now> atlasprd003@xxxxxxx 864514.00 1000.00 1659 185000.12 <now> group_CMS 0.38 ByQuota 1000.00 2736 490488.81 <now> 740 sgmcms@xxxxxxx 506.17 1000.00 0 111.01 0+02:40 cmsplt003@xxxxxxx 18756.23 1000.00 0 3397.98 0+02:13 cmsger014@xxxxxxx 582834.44 1000.00 2152 71347.33 <now> cmsplt036@xxxxxxx 2132669.25 1000.00 584 415632.84 <now> group_BELLE2 0.00 ByQuota 1000.00 9343 1542435.50 <now> 9343 belleprd003@xxxxxxx 509.38 1000.00 2 66.21 <now> belleprd001@xxxxxxx 3941.14 1000.00 3 696.99 <now> belleprd004@xxxxxxx 8623840.00 1000.00 9338 1541663.00 <now> group_LHCB 0.00 ByQuota 1000.00 1921 311050.06 <now> 1921 lhcbusr000@xxxxxxx 709.77 1000.00 2 116.21 <now> lhcbplt000@xxxxxxx 1710409.88 1000.00 1919 310934.28 <now> ------------------------ --------- ------- ------------ --------- ------ ------------ ---------- ---------- Number of users: 15 ByQuota 15956 2610734.75 0+23:59

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature