[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Looking for negotiator optimization setting

Hello All,

I've looked into this a bit further and found that there isn't currentlyÂa setting to end negotiation with a submitter once their minimum job weight is less than their remaining quota. Furthermore, this seems like it would take a significant amount of work to add, because the Negoitiator is only aware of aggregate job stats at the point it considers moving on. The Schedd would need to calculate and send the minJobWeight each time, which could then be used the same as minSlotWeight. Luckily the workaround of using a lowÂNEGOTIATOR_MAX_TIME_PER_SUBMITTER has helped significantly.

As for the negotiator not moving on from a submitter when their remaining quota is less than the smallest available slot weight; it tries to do this, but runs into the following things:

1. When calculating the minSlotWeight, the negotiator doesn't treat partitionable slots differently. (GetSlotWeight for reference.) This leads to the following things when weight is based on Cpus:

a. Completely full partitionable slots have a weight of zero and will cause the minSlotWeight to be zero, which causes my original issue.

b. Mostly-empty partitionable slots have a large weight, when in reality they can accept small jobs at a lower cost if they have a consumption policy enabled. This could lead to negotiation with a submitter being ended while jobs can still be matched (e.g. if the pool consists entirely of large, empty partitionable slots with CONSUMPTION_POLICY = True, and the submitter receives a quota smaller than one host's weight).

2. The negotiator does not recalculate minSlotWeight after trimming slots viaÂNEGOTIATOR_SLOT_POOLSIZE_CONSTRAINT, meaning the minimum could be higher for the slots that are actually being considered. This would have prevented 1a for us, as we trim full slots (Cpus == 0).

Hope this helps anyone else with this issue.


On Fri, Jun 15, 2018 at 4:11 PM, Collin Mehring <collin.mehring@xxxxxxxxxxxxxx> wrote:
Dear Condor experts,

I've noticed the following scenario in our NegotiatorLog leading to a longer than usual negotiation cycle duration, and was wondering if there's a setting/knob that could help.

Example: (using HTCondor v8.6.2)

A submitter has several thousand idle, ready jobs (that are not clustered well) and a few hundred running jobs. The running jobs/slots used are weighted by Cpus (the default) and are consuming almost all of the submitter's quota, so when negotiation with this submitter starts they receive a small limit. The submitter belongs to an accounting group, but they are the only one in that group. The accounting group is using a dynamic allocation and accepts surplus, leading to fractions of slots being assigned.

Here are the relevant lines from the start of negotiation, if preferred:
06/14/18 18:06:54 0 seconds so far for this submitter
06/14/18 18:06:54 1 seconds so far for this schedd
06/14/18 18:06:54Â Â maxAllowed= 60.7366Â ÂgroupQuota= 1284.74Â Âgroupusage=Â 1224
06/14/18 18:06:54Â ÂCalculating submitter limit with the following parameters
06/14/18 18:06:54  ÂSubmitterPrio   Â= 1344791.875000
06/14/18 18:06:54Â Â ÂSubmitterPrioFactor = 1000.000000
06/14/18 18:06:54  ÂsubmitterShare   = 1.000000
06/14/18 18:06:54  ÂsubmitterAbsShare Â= 1.000000
06/14/18 18:06:54  ÂsubmitterLimit  = 60.736572
06/14/18 18:06:54  ÂsubmitterUsage  = 1224.000000

This submitter then quickly matches three out of the first ~20 ready jobs, costing 32, 19, and 9, which uses up almost all of it's limit.

The next job considered after the third match:
06/14/18 18:06:54 matchmakingAlgorithm: limit 60.736572 used 60.000000 pieLeft 0.736572

The submitter has run for less than a second at this point and can no longer match any additional jobs, because the smallest possible weighted request it can make is one core. (Though it's also possible for this situation to occur when pieLeft is >1 if the submitter only has larger/heavier jobs ready to run.) We use an _expression_ for RequestCpus, but set SCHEDD_SLOT_WEIGHT such that the schedds are still able to accurately weigh the jobs before sending them to the negotiator.

I'm looking for a setting that would cause the negotiator to immediately end negotiation with a submitter when all remaining individual job weights for that submitter are greater than the remaining quota (pieLeft). This already happens if the pieLeft hits zero, but they usually end up with some fraction of a slot as shown here.Â

Alternatively, if I could define a minimum required pieLeft for it to continue negotiating, that would also solve the most common occurrence.

In this example the negotiator continued to consider jobs from this submitter for another 44 seconds without making any additional matches. This seems long to me even if every job is considered, because it should be able to just immediately reject all of them for the submitter limit. Other submitters also experienced this issue, but this one had the most idle jobs and thus took the longest.

The last few lines before it moves to the next submitter:
06/14/18 18:07:38Â Â ÂSending SEND_RESOURCE_REQUEST_LIST/20/eom
06/14/18 18:07:38Â Â ÂGetting reply from schedd ...
06/14/18 18:07:38Â Â ÂGot NO_MORE_JOBS;Â schedd has no more requests
06/14/18 18:07:38Â ÂThis submitter hit its submitterLimit.

My short-term solution is to lower the timeout period for submitters to mitigate the issue, as none of them legitimately take this long.

It's always possible I'm approaching this wrong, or there's some other setting I'm missing. I'd love to hear your thoughts.