Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] soft/hard limiting cpu.shares ?

Date: Thu, 17 Jun 2021 10:14:44 +0200
From: thomas.hartmann@xxxxxxx
Subject: Re: [HTCondor-users] soft/hard limiting cpu.shares ?

Hi Tom,

many thanks for the confirmation :)

tbh, we want probably eat the cake and keep it, i.e., having a somewhathard limit but being lenient towards our users... Probably we will playa bit with the cgroups' values and see how the system evolves.


Cheers and thanks,
  Thomas



On 16/06/2021 17.44, tpdownes@xxxxxxxxx wrote:

Thomas:

You understand the cpu shares mechanism correctly. It's a soft limitwith a policy for resolving conflict when conflict arises.

If you really want to nail down HTCondor jobs to a total number ofcores, you want to want to use cpu.cfs_quota_us (and optionallycpu.cfs_period_us) on the parent htcondor cgroup. This is an honest togoodness hard limit on CPU usage that works in parallel with the sharesmechanism.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-cpu<https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-cpu>


Short version, to assignÂ1-core to the cgroup, set the quota to 1000000.

Within the htcondor cgroup, shares will be enforced by HTCondor but theoverall limit will be applied at the parent level.

Tom

On Wed, Jun 16, 2021 at 10:20 AM Thomas Hartmann<thomas.hartmann@xxxxxxx <mailto:thomas.hartmann@xxxxxxx>> wrote:


    Hi all,

    a short question regarding jobs core time scaling via cgroup cpu.shares:

    The relative share of a job's cgroup is only limiting with respect to
    the total core-scaled CPU time, or?

    I.e., we are running our nodes with hyperthreading 2x enabled for
    simplicity, since we use the same machines for production jobs as well
    as for user job sub-clusters.

    Since user have occasionally odd user jobs (that tend to work better
    without overbooking) we broker on user nodes only 1/2 of the HT-core
    numbers for jobs.

    now, the condor parent cgroup has assigned
     Â Âhtcondor/cpu.shares = 1024
    with respect to the total system share of
     Â Âcpu.shares=1024
    so all condor child processes (without further sub-groups) could in
    principle use up to 100% of the total HT-core scaled CPU time.

    A single core job gets a relative share like

    htcondor/condor_var_lib_condor_execute_slot2_15@xxxxxxxxxxxxxxx/cpu.shares
    <http://condor_var_lib_condor_execute_slot2_15@xxxxxxxxxxxxxxx/cpu.shares>
    100
    where we broker only 50% of the total HT-core scaled time - as far
    as I see.

    However, user jobs can utilize more than their nominally assigned
    cpu share.
    My understanding is, that the kernel notices, that the total CPU
    time is
    not utilized completely - and thus allows processes to use more than
    their nominal time limit as there is still CPU time available.
    Is this correct? ð

    When we scale the condor parent cgroup to a reasonable fraction of the
    system cpu.share (taking HT efficiency into account), we should be able
    to scale CPU times per job to (roughly) core-equivalents - without the
    need to bind jobs to specific cores, or?

    Cheers,
     Â ÂThomas

    _______________________________________________
    HTCondor-users mailing list
    To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
    <mailto:htcondor-users-request@xxxxxxxxxxx> with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
    <https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>

    The archives can be found at:
    https://lists.cs.wisc.edu/archive/htcondor-users/
    <https://lists.cs.wisc.edu/archive/htcondor-users/>


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

References:
- [HTCondor-users] soft/hard limiting cpu.shares ?
  - From: Thomas Hartmann
- Re: [HTCondor-users] soft/hard limiting cpu.shares ?
  - From: tpdownes

Prev by Date: Re: [HTCondor-users] soft/hard limiting cpu.shares ?
Next by Date: [HTCondor-users] accounting for slots/cores in draining?
Previous by thread: Re: [HTCondor-users] soft/hard limiting cpu.shares ?
Next by thread: [HTCondor-users] accounting for slots/cores in draining?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] soft/hard limiting cpu.shares ?