[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Negotiator only allocating 1 job per machine per cycle



Hello Todd:

 

Excuse me for sliding myself into the discussion that you have with Kneller and Ho.

My big need and question is how to send a given number of jobs to a specific machine.

For example: I have 20 jobs to submit but I want 4 to go to machine A, 7 to machine B and so on.

I assume that the number of cores of a given machine has to be less or equal to the number of jobs assigned to it, but correct me if I am wrong.

Ultimately, I want to know if there is a way of doing this and what instructions should I have to place and where (that is, on the condor_config or in the submission file).

Help would be much appreciated. Kneller and Ho: please, fell free to send any comments, suggestions of things worth trying.

Sincerely

jjv

 

Julio J. ValdÃs

National Research Council Canada                                    | Conseil National de Recherches Canada

Digital Technologies Research Centre                               | Centre de Recherche en Technologies NumÃriques

Data Science for Complex Systems Group                        | Science des DonnÃes pour les SystÃmes Complexes

M-50, 1200 Montreal Road, Ottawa, Ontario K1A 0R6 | M-50, 1200 chemin MontrÃal, Ottawa, Ontario K1A 0R6

Canada                                                                                     | Canada

julio.valdes@xxxxxxxxxxxxxx

tel/tÃl: (1)613-993-0257

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Todd Tannenbaum
Sent: Tuesday, August 31, 2021 1:45 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>; John M Knoeller <johnkn@xxxxxxxxxxx>
Cc: Isaac Ho <IHo@xxxxxxxxxxxx>
Subject: Re: [HTCondor-users] Negotiator only allocating 1 job per machine per cycle

 

***ATTENTION*** This email originated from outside of the NRC. ***ATTENTION*** Ce courriel provient de l'extÃrieur du CNRC

On 8/31/2021 9:12 AM, John M Knoeller wrote:

Ok, yes,  when a concurrency limit is in place, the Negotiator is responsible for managing the limits and the Schedd won't be able to start more than one job for each match. 

 


The above is a limitation we hope to remove at some point in the future.  In the meantime, there is a workaround available, which is to enable the "consumption policies" mechanism.  The good news is with consumption policies enabled, multiple jobs can be started per server per cycle even with concurrency limits enabled.  The bad news is consumption policies is off by default because
   a) enabling it reduces scalability, which could be an issue for large pools with many thousands of slots, and
   b) it is a mechanism we are may drop at some point in the future because it favors a centralized architecture where the negotiator has more control, and moving forward we want to emphasize a more decentralized approach where the schedd has more control control.

If the pros outweigh the cons for your situation, and you wish to enable consumption policies in your pool, add the following to the condor config on all your execute nodes:
  

  # Disable CLAIM_PARTITIONABLE_LEFTOVERS and instead enable
  # Consumption Policies so that concurrency limits behavior
  # can assign multiple jobs per pslot per negotiator cycle.
  CLAIM_PARTITIONABLE_LEFTOVERS = False
  CONSUMPTION_POLICY = True

After doing the above change, I believe you will need to restart HTCondor (condor_restart)...  I don't think the above config change can be done on-the-fly with condor_reconfig.

More details on consumption policies are in the Manual at:
  
  https://htcondor.readthedocs.io/en/latest/admin-manual/policy-configuration.html?condor-negotiator-side-resource-consumption-policies#condor-negotiator-side-resource-consumption-policies

You can also see more discussions about this by searching the htcondor-users email archive for CONSUMPTION_POLICIES like so:

  https://www-auth.cs.wisc.edu/lists/htcondor-users/htdig/search.shtml#gsc.tab=0&gsc.q=CONSUMPTION_POLICIES&gsc.sort=

Hope the above helps,
Todd