[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] htcondor cgroups and memory limits on CentOS7



Hi Thomas,

Yes, that's what I see. Ignore my first email - I had forgotten to configure partitionable slots for the little test :-) 

Regards,
Andrew.

________________________________________
From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Thomas Hartmann [thomas.hartmann@xxxxxxx]
Sent: Tuesday, October 24, 2017 11:25 AM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] htcondor cgroups and memory limits on CentOS7

Hi Andrew and Alessandra,

at us with condor 8.6.5-2 on EL7 (3.10.0-514.26.2) the cgroup soft limit
is set according to the job's requested memory as far as I can see.
E.g., the six 8cores on the node [1] have either 4GB or 12GB as requested

the nodes condor setting on cgroups are currently
  BASE_CGROUP = /system.slice/condor.service
  CGROUP_MEMORY_LIMIT_POLICY = soft
i.e., no container universe

Cheers,
  Thomas

[1]
/sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_1@xxxxxxxxxxxxxxxxx/memory.soft_limit_in_bytes
4294967296
/sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_2@xxxxxxxxxxxxxxxxx/memory.soft_limit_in_bytes
4294967296
/sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_3@xxxxxxxxxxxxxxxxx/memory.soft_limit_in_bytes
4294967296
/sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_4@xxxxxxxxxxxxxxxxx/memory.soft_limit_in_bytes
4294967296
/sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_5@xxxxxxxxxxxxxxxxx/memory.soft_limit_in_bytes
12616466432
/sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_6@xxxxxxxxxxxxxxxxx/memory.soft_limit_in_bytes
4294967296



On 2017-10-24 11:07, andrew.lahiff@xxxxxxxxxx wrote:
> Hi Alessandra,
>
> There seems to have been a change in behavior with respect to how HTCondor configures cgroups. With older versions of HTCondor, it used to set memory.soft_limit_in_bytes when using soft memory limits (at least this is what I remember).
>
> However, now (e.g. in 8.6.6) memory.soft_limit_in_bytes seems to be set to the total memory of the machine, and memory.memsw.limit_in_bytes is set at memory that the job requested. We use the Docker universe now so in our case it's Docker that's creating the cgroups.
>
> Regards,
> Andrew.
>
> ________________________________
> From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Alessandra Forti [Alessandra.Forti@xxxxxxx]
> Sent: Tuesday, October 24, 2017 9:39 AM
> To: htcondor-users@xxxxxxxxxxx
> Subject: Re: [HTCondor-users] htcondor cgroups and memory limits on CentOS7
>
> Hi Thomas,
>
>
> On 24/10/2017 09:17, Thomas Hartmann wrote:
>
> Hi Todd, (sorry to fork in between)
>
> I am a bit confused regarding the soft limits.
>
> So far I had assumed that the kernel would allow a cgroup to exceed its
> soft limit usage as long as there is free memory available
>
> do you set the limit or your htcondor does? because my htcondor doesn't set that limit. Maybe I'm doing something wrong.
>
> - and kill a
> group's processes if the system runs low on unwired memory (assuming a
> translation between limits in condor to cgroup limits).
>
>
> So, we have effectively not set a 'real' cgroup hard limit assuming that
> the soft limit would be sufficient, e.g., would the kernel kill [1] when
> exceeding it's 4GB soft limit and running low on system-wide memory?
>
> no the kernel doesn't kill with the soft limit. This is why system periodic remove is needed.
>
> (looking now onto the values: would memsw -set to such a large value-
> actually send the job heavily swapping...?)
>
>
> infact memsw is the place where RAM+swap is limited. However as pointed out in the thread you may end up with a job which has 0 memory and 4GB of swap.
>
>
> Cheers,
>   Thomas
>
>
>
> [1]
> /sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_6@xxxxxxxxxxxxxxxxx/memory.limit_in_bytes
> 142668537856
> /sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_6@xxxxxxxxxxxxxxxxx/memory.memsw.limit_in_bytes
> 142668541952
> /sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_6@xxxxxxxxxxxxxxxxx/memory.soft_limit_in_bytes
> 4294967296
>
>
> On 2017-10-20 18:26, Todd Tannenbaum wrote:
>
>
> On 10/20/2017 9:44 AM, Alessandra Forti wrote:
>
>
> Hi,
>
> is more information needed?
>
>
>
> Hi Alessandra,
>
> The version of HTCondor you are using would be helpful :).
>
> But I have some answers/suggestions below that I hope will help...
>
>
>
> * On the head node
>
> RemoveMemoryUsage = ( ResidentSetSize_RAW > 2000*RequestMemory )
> SYSTEM_PERIODIC_REMOVE = $(RemoveMemoryUsage)  || <OtherParameters>
>
> So the questions are two
>
> 1) Why SYSTEM_PERIODIC_REMOVE  didn't work?
>
>
> Because the (system_)periodic_remove expressions are evaluated by the
> condor_shadow while the job is running, and the *_RAW attributes are
> only updated in the condor_schedd.
>
> A simple solution is to use attribute MemoryUsage instead of
> ResidentSetSize_RAW.  So I think things will work as you want if you
> instead did:
>
>   RemoveMemoryUsage = ( MemoryUsage > 2*RequestMemory )
>   SYSTEM_PERIODIC_REMOVE = $(RemoveMemoryUsage)  || <OtherParameters>
>
> Note that MemoryUsage is in the same units as RequestMemory, so only
> need to multiply by 2 instead of 2000.
>
> You are not the first person to be tripped up by this. :(  I realize it
> is not at all intuitive. I think I will add a quick patch in the code to
> allow _RAW attributes to be referenced inside of job policy expressions
> to help prevent frustration by the next person.
>
> Also you may want to place your memory limit policy on the execute nodes
> via startd policy expression, instead of having them enforced on the
> submit machine (what I think you are calling the head node).  The reason
> is the execute node policy is evaluated every five seconds, while the
> submit machine policy is evaluated every several minutes.  A runaway job
> could consume a lot of memory in a few minutes :).
>
>
>
> 2) Shouldn't htcondor set the job soft limit with this configuration?
> or is the site expected to set the soft limit separately?
>
>
>
> Personally, I think "soft" limits in cgroups are completely bogus.  The
> way the Linux kernel treats soft limits does not do in practice what
> anyone (including htcondor itself) expects.  I recommend settings
> CGROUP_MEMORY_LIMIT to either none or hard, soft makes no sense imho.
>
> "CGROUP_MEMORY_LIMIT=hard" is clear to understand: if the job uses more
> memory than it requested, it is __immediately__ kicked off and put on
> hold.  This way users get a consistent experience.
>
> If you want jobs to be able to go over their requested memory so long as
> the machine isn't swapping, consider disabling swap on your execute
> nodes (not a bad idea for compute servers in general) and simply leaving
> "CGROUP_MEMORY_LIMIT=none".  What will happen is if the system is
> stressed, eventually the Linux OOM (out of memory killer) will kick in
> and pick a process to kill.  HTCondor sets the OOM priority of job
> process such that the OOM killer should always pick job processes ahead
> of other processes on the system.  Furthermore, HTCondor "captures" the
> OOM request to kill a job and only allows it to continue if the job is
> indeed using more memory than requested (i.e. provisioned in the slot).
> This is probably what you wanted by setting the limit to soft in the
> first place.
>
> I am thinking we should remove the "soft" option to CGROUP_MEMORY_LIMIT
> in future releases, it just causes confusion imho.  Curious if others on
> the list disagree...
>
> Hope the above helps,
> regards,
> Todd
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx<mailto:htcondor-users-request@xxxxxxxxxxx> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>
>
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx<mailto:htcondor-users-request@xxxxxxxxxxx> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>
>
> --
> Respect is a rational process. \\//
> Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante)
> For Ur-Fascism, disagreement is treason. (U. Eco)
> But but but her emails... covfefe!
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>