[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor Concurrency Limits



On 10/3/2016 2:54 AM, Jason Liu wrote:
> Hi Todd,
> 
> Thanks for your help.
> 
> I tried setting CONSUMPTION_POLICY = True. You are right that it improved a little:
> 
>  ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
> 4019.0   spookfish      10/3  15:42   0+00:02:24 R  0   97.7 sample_load.a 1 0
> 4019.1   spookfish      10/3  15:42   0+00:02:24 R  0   97.7 sample_load.a 1 1
> 4019.2   spookfish      10/3  15:42   0+00:01:24 R  0   97.7 sample_load.a 1 2
> 4019.3   spookfish      10/3  15:42   0+00:01:24 R  0   97.7 sample_load.a 1 3
> 4019.4   spookfish      10/3  15:42   0+00:00:00 I  0   0.1  sample_load.a 1 4
> 4019.5   spookfish      10/3  15:42   0+00:00:00 I  0   0.1  sample_load.a 1 5
> 4019.6   spookfish      10/3  15:42   0+00:00:00 I  0   0.1  sample_load.a 1 6
> 4019.7   spookfish      10/3  15:42   0+00:00:00 I  0   0.1  sample_load.a 1 7
> 
> 8 jobs; 0 completed, 0 removed, 4 idle, 4 running, 0 held, 0 suspended
> 
> 
> I have set each of the jobs to consume 25% of total custom resource. Now it seems instead of matching one job per matchmaking cycle, it was matching two :) when it can clearly run up to 4 jobs at the same time.
> 
> 
> Any more insights?
>


Hi Jason,

I tried the same type of setup you have. I made a v8.4.7 personal condor pool with one partionable slot w/ 8 cpus, a custom concurrency limit set at 100 "licenses", submitted 8 jobs where each job requested 25 licenses.  No more than four jobs at a time ran, as expected.  With CONSUMPTION_POLICIES=False (default), one job started per negotiation cycle.  With CONSUMPTION_POLICIES=True, all four jobs started in one negotiation cycle.  My condor_config and submit file entries appear below.

So you said you are running HTCondor v8.4.7; are you running v8.4.7 everywhere including your central manager, or is your central manager running an older version of HTCondor?  If it is running something that predates v8.4, due to this ticket https://is.gd/IPj0Ia , you may need to also add NEGOTIATOR_MATCHLIST_CACHING=false to your central manager config .

Also, did you set CONSUMPTION_POLICIES=True on ALL of your execute nodes and then restart HTCondor on those nodes?  (not sure if condor_reconfig is enough).  Does
  condor_status -cons 'isUndefined(ConsumptionCpus)'
return any output? (it should not if consumption_policies is successfully enabled on all execute nodes).

So my test setup ---

In condor_config.local, I have:

 # Personal HTCondor setup; run a private pool on my laptop
 network_interface = 127.0.0.1
 condor_host = 127.0.0.1
 daemon_list = MASTER,COLLECTOR,STARTD,SCHEDD,NEGOTIATOR
 # Set a concurrency limit called only100 w/ 100 licenses
 only100_limit = 100
 # Enable consumption policies
 consumption_policy = True
 # Create one partitionable slot w/ 8 cpu cores
 num_cpus = 8
 num_slots=1
 num_slots_type_1=1
 slot_type_1=100%
 slot_type_1_partitionable = true

and here is my test job submit file (in file limit.sub):

 executable = c:\utils\sleep.exe
 arguments = 500000
 transfer_executable = false
 concurrency_limits = only100:25
 queue 8

And here we go....

C:\condor\test>condor_version
$CondorVersion: 8.4.7 Jun 03 2016 BuildID: 369249 $
$CondorPlatform: x86_64_Windows8 $

C:\condor\test>condor_submit limit.sub
Submitting job(s)........
8 job(s) submitted to cluster 153.

C:\condor\test>condor_q


-- Schedd: ToddsThinkpad : <127.0.0.1:50145?...
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
 153.0   tannenba       10/3  15:36   0+00:00:07 R  0   0.0  sleep.exe 500000
 153.1   tannenba       10/3  15:36   0+00:00:07 R  0   0.0  sleep.exe 500000
 153.2   tannenba       10/3  15:36   0+00:00:07 R  0   0.0  sleep.exe 500000
 153.3   tannenba       10/3  15:36   0+00:00:07 R  0   0.0  sleep.exe 500000
 153.4   tannenba       10/3  15:36   0+00:00:00 I  0   0.0  sleep.exe 500000
 153.5   tannenba       10/3  15:36   0+00:00:00 I  0   0.0  sleep.exe 500000
 153.6   tannenba       10/3  15:36   0+00:00:00 I  0   0.0  sleep.exe 500000
 153.7   tannenba       10/3  15:36   0+00:00:00 I  0   0.0  sleep.exe 500000

8 jobs; 0 completed, 0 removed, 4 idle, 4 running, 0 held, 0 suspended