[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Negotiator only allocating 1 job per machine per cycle



Hi Everyone,

 

We have been submitting a batch of jobs with following characteristics:

·         Single variable within the batch, e.g. queue var in (var1, var2, var3, …….)

·         Assign a priority equal to the process number e.g. priority=$(Process)

·         Request 3 cpus e.g. request_cpus=3

 

What we are seeing is that the jobs are being launched by HTCondor in a staggered manner, one on each machine on the grid every cycle. Here is an example of the negotiator log.

 

08/26/21 17:02:01 ---------- Started Negotiation Cycle ----------

08/26/21 17:02:01 Phase 1:  Obtaining ads from collector ...

08/26/21 17:02:01   Getting startd private ads ...

08/26/21 17:02:01   Getting Scheduler, Submitter and Machine ads ...

08/26/21 17:02:01   Sorting 7 ads ...

08/26/21 17:02:01 Got ads: 7 public and 3 private

08/26/21 17:02:01 Public ads include 1 submitter, 3 startd

08/26/21 17:02:01 Phase 2:  Performing accounting ...

08/26/21 17:02:01 group quotas: assigning 1 submitters to accounting groups

08/26/21 17:02:01 group quotas: assigning group quotas from 84 available weighted slots

08/26/21 17:02:01 group quotas: allocation round 1

08/26/21 17:02:01 group quotas: groups= 6  requesting= 1  served= 1  unserved= 0  slots= 84  requested= 123  allocated= 84  surplus= 0  maxdelta= 84

08/26/21 17:02:01 group quotas: entering RR iteration n= 84

08/26/21 17:02:01 Group group_0 - skipping, zero slots allocated

08/26/21 17:02:01 Group group_1 - BEGIN NEGOTIATION

08/26/21 17:02:01 Phase 3:  Sorting submitter ads by priority ...

08/26/21 17:02:01 Starting prefetch round; 1 potential prefetches to do.

08/26/21 17:02:01 Starting prefetch negotiation for group_1.YYYY@xxxxxxxxxxxxx

08/26/21 17:02:01     Got NO_MORE_JOBS;  schedd has no more requests

08/26/21 17:02:01 Prefetch summary: 1 attempted, 1 successful.

08/26/21 17:02:01 Phase 4.1:  Negotiating with schedds ...

08/26/21 17:02:01   Negotiating with group_1.YYYY@xxxxxxxxxxxx at <10.1.23.222:9618?addrs=10.1.23.222-9618&noUDP&sock=8396_d680_3>

08/26/21 17:02:01 0 seconds so far for this submitter

08/26/21 17:02:01 0 seconds so far for this schedd

08/26/21 17:02:01     Request 400158.00040: autocluster 4430 (request count 1 of 41)

08/26/21 17:02:01       Matched 400158.40 group_1.YYYY@xxxxxxxxxxxx <10.1.23.222:9618?addrs=10.1.23.222-9618&noUDP&sock=8396_d680_3> preempting none <10.1.26.85:9618?addrs=10.1.26.85-9618&noUDP&sock=4212_6bc2_3> slot1@xxxxxxxxxxxxxxxxxxxxx

08/26/21 17:02:01       Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxxx

08/26/21 17:02:01     Request 400158.00040: autocluster 4430 (request count 2 of 41)

08/26/21 17:02:01       Matched 400158.40 group_1.YYYY@xxxxxxxxxxxx <10.1.23.222:9618?addrs=10.1.23.222-9618&noUDP&sock=8396_d680_3> preempting none <10.1.26.84:9618?addrs=10.1.26.84-9618&noUDP&sock=4916_7db8_3> slot1@xxxxxxxxxxxxxxxxxxxxx

08/26/21 17:02:01       Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxxx

08/26/21 17:02:01     Request 400158.00040: autocluster 4430 (request count 3 of 41)

08/26/21 17:02:01       Matched 400158.40 group_1.YYYY@xxxxxxxxxxxx <10.1.23.222:9618?addrs=10.1.23.222-9618&noUDP&sock=8396_d680_3> preempting none <10.1.26.86:9618?addrs=10.1.26.86-9618&noUDP&sock=5004_7ff6_3> slot1@xxxxxxxxxxxxxxxxxxxxx

08/26/21 17:02:01       Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxxx

08/26/21 17:02:01     Request 400158.00040: autocluster 4430 (request count 4 of 41)

08/26/21 17:02:01       Rejected 400158.40 group_1.YYYY@xxxxxxxxxxxx <10.1.23.222:9618?addrs=10.1.23.222-9618&noUDP&sock=8396_d680_3>: no match found

08/26/21 17:02:01  negotiateWithGroup resources used submitterAds length 1

08/26/21 17:02:01 Group group_2 - skipping, zero slots allocated

08/26/21 17:02:01 Group group_3 - skipping, zero slots allocated

08/26/21 17:02:01 Group group_4 - skipping, zero slots allocated

08/26/21 17:02:01 Group <none> - skipping, zero slots allocated

08/26/21 17:02:01 Round 1 totals: allocated= 84  usage= 84

08/26/21 17:02:01 ---------- Finished Negotiation Cycle ----------

08/26/21 17:03:01 ---------- Started Negotiation Cycle ----------

08/26/21 17:03:01 Phase 1:  Obtaining ads from collector ...

08/26/21 17:03:01   Getting startd private ads ...

08/26/21 17:03:01   Getting Scheduler, Submitter and Machine ads ...

08/26/21 17:03:01   Sorting 10 ads ...

08/26/21 17:03:01 Got ads: 10 public and 6 private

08/26/21 17:03:01 Public ads include 1 submitter, 6 startd

08/26/21 17:03:01 Phase 2:  Performing accounting ...

08/26/21 17:03:01 group quotas: assigning 1 submitters to accounting groups

08/26/21 17:03:01 group quotas: assigning group quotas from 84 available weighted slots

08/26/21 17:03:01 group quotas: allocation round 1

08/26/21 17:03:01 group quotas: groups= 6  requesting= 1  served= 1  unserved= 0  slots= 78  requested= 123  allocated= 84  surplus= 0  maxdelta= 75

08/26/21 17:03:01 group quotas: entering RR iteration n= 75

08/26/21 17:03:01 Group group_0 - skipping, zero slots allocated

08/26/21 17:03:01 Group group_2 - skipping, zero slots allocated

08/26/21 17:03:01 Group group_1 - BEGIN NEGOTIATION

08/26/21 17:03:01 Phase 3:  Sorting submitter ads by priority ...

08/26/21 17:03:01 Starting prefetch round; 1 potential prefetches to do.

08/26/21 17:03:01 Starting prefetch negotiation for group_1.YYYY@xxxxxxxxxxxxx

08/26/21 17:03:01     Got NO_MORE_JOBS;  schedd has no more requests

08/26/21 17:03:01 Prefetch summary: 1 attempted, 1 successful.

08/26/21 17:03:01 Phase 4.1:  Negotiating with schedds ...

08/26/21 17:03:01   Negotiating with group_1.YYYY@xxxxxxxxxxxx at <10.1.23.222:9618?addrs=10.1.23.222-9618&noUDP&sock=8396_d680_3>

08/26/21 17:03:01 0 seconds so far for this submitter

08/26/21 17:03:01 0 seconds so far for this schedd

08/26/21 17:03:01     Request 400158.00037: autocluster 4430 (request count 1 of 38)

08/26/21 17:03:01       Matched 400158.37 group_1.YYYY@xxxxxxxxxxxx <10.1.23.222:9618?addrs=10.1.23.222-9618&noUDP&sock=8396_d680_3> preempting none <10.1.26.85:9618?addrs=10.1.26.85-9618&noUDP&sock=4212_6bc2_3> slot1@xxxxxxxxxxxxxxxxxxxxx

08/26/21 17:03:01       Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxxx

08/26/21 17:03:01     Request 400158.00037: autocluster 4430 (request count 2 of 38)

08/26/21 17:03:01       Matched 400158.37 group_1.YYYY@xxxxxxxxxxxx <10.1.23.222:9618?addrs=10.1.23.222-9618&noUDP&sock=8396_d680_3> preempting none <10.1.26.84:9618?addrs=10.1.26.84-9618&noUDP&sock=4916_7db8_3> slot1@xxxxxxxxxxxxxxxxxxxxx

08/26/21 17:03:01       Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxxx

08/26/21 17:03:01     Request 400158.00037: autocluster 4430 (request count 3 of 38)

08/26/21 17:03:01       Matched 400158.37 group_1.YYYY@xxxxxxxxxxxx <10.1.23.222:9618?addrs=10.1.23.222-9618&noUDP&sock=8396_d680_3> preempting none <10.1.26.86:9618?addrs=10.1.26.86-9618&noUDP&sock=5004_7ff6_3> slot1@xxxxxxxxxxxxxxxxxxxxx

08/26/21 17:03:01       Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxxx

08/26/21 17:03:01     Request 400158.00037: autocluster 4430 (request count 4 of 38)

08/26/21 17:03:01       Rejected 400158.37 group_1.YYYY@xxxxxxxxxxxx <10.1.23.222:9618?addrs=10.1.23.222-9618&noUDP&sock=8396_d680_3>: no match found

08/26/21 17:03:01  negotiateWithGroup resources used submitterAds length 1

08/26/21 17:03:01 Group group_3 - skipping, zero slots allocated

08/26/21 17:03:01 Group group_4 - skipping, zero slots allocated

08/26/21 17:03:01 Group <none> - skipping, zero slots allocated

08/26/21 17:03:01 Round 1 totals: allocated= 84  usage= 84

08/26/21 17:03:01 ---------- Finished Negotiation Cycle ----------

 

 

Previously we had not been seeing this when we were launching jobs in individual batches, but also without the priority nor the cpu request commands. Is either of these job commands responsible for the staggered launch?

 

Thanks,

 

Lachlan

 

This communication (both the message and any attachments or links) is confidential and only intended for the use of the person or persons to whom it is addressed unless we have expressly authorized otherwise. It also may contain information that is protected by solicitor-client privilege. If you are reading this communication and are not an addressee or authorized representative of an addressee, we hereby notify you that any distribution, copying or other use of it without our express authorization is strictly prohibited. If you have received this communication in error, please delete both the message and any attachments from your system and notify us immediately by e-mail or phone. In addition, we note that this communication and its transmission of data have not been secured by encryption. Therefore, we are not able to confirm or guarantee that the communication has not been intercepted, amended, or read by an unintended third party.