[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor Concurrency Limits



>
>On 9/19/2016 11:35 PM, Jason Liu wrote:
>  > This tells me when concurrency limit is enabled, condor is matching one
>> job at a time, and the matchmaking cycle is something like 20 seconds to
>> 1 minute. In our production cluster we need to push through 100k jobs in
>> a day. Obviously matching one job per minute is not very scalable. So I
>> am wondering if there is anything I have done wrong here.
>>
>> We are running condor 8.4.7 on Ubuntu 14.04.
>>
>> Thanks in advance for any help.
>>
>
>Are your execute nodes (startds) configured to use partitionable slots?
>If so, there are known issues combining concurrency limits and
>partitionable slots, including the behavior you saw above.  We plan to
>improve this in a future release, but for the moment I think you can
>work around these problems by adding the following line to your
>condor_config :
>
>    CONSUMPTION_POLICY = True
>
>I think you will need to do this on all of your execute nodes (startds),
>then do a condor_reconfig or condor_restart.  Try doing this and
>reporting back, I think you will find improvement. For a lot of detail
>about what is happening here, see
>
>   https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=ConsumptionPolicies
>
>regards,
>Todd
>

Hi Todd,

Thanks for your help.

I tried setting CONSUMPTION_POLICY = True. You are right that it improved a little:

 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
4019.0   spookfish      10/3  15:42   0+00:02:24 R  0   97.7 sample_load.a 1 0
4019.1   spookfish      10/3  15:42   0+00:02:24 R  0   97.7 sample_load.a 1 1
4019.2   spookfish      10/3  15:42   0+00:01:24 R  0   97.7 sample_load.a 1 2
4019.3   spookfish      10/3  15:42   0+00:01:24 R  0   97.7 sample_load.a 1 3
4019.4   spookfish      10/3  15:42   0+00:00:00 I  0   0.1  sample_load.a 1 4
4019.5   spookfish      10/3  15:42   0+00:00:00 I  0   0.1  sample_load.a 1 5
4019.6   spookfish      10/3  15:42   0+00:00:00 I  0   0.1  sample_load.a 1 6
4019.7   spookfish      10/3  15:42   0+00:00:00 I  0   0.1  sample_load.a 1 7

8 jobs; 0 completed, 0 removed, 4 idle, 4 running, 0 held, 0 suspended


I have set each of the jobs to consume 25% of total custom resource. Now it seems instead of matching one job per matchmaking cycle, it was matching two :) when it can clearly run up to 4 jobs at the same time.


Any more insights?

Thanks
Jason

PRIVACY AND CONFIDENTIALITY NOTICE
The information contained in this message is intended for the named recipients only. It may contain confidential information and if you are not the intended recipient, you must not copy, distribute or take any action in reliance on it. If you have received this message in error please destroy it and reply to the sender immediately or contact us at the above telephone number.
VIRUS DISCLAIMER
While we take every precaution against presence of computer viruses on our system, we accept no responsibility for loss or damage arising from the transmission of viruses to e-mail recipients.