[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] spontaneous reboots after enabling cgroups



Hi,

A few thoughts-
- Why do you want swap on your worker nodes?  We found it much more useful to just disable swap and kill jobs when they went over their memory limit.
- You can set the swappiness of the /condor cgroup to 0, disabling swap only for condor jobs and processes.
- cgroups, depending on the kernel and distro, may also track memory+swap usage. We don't do this currently, but is a very simple change.
- We already listen for events about OOM issues in the cgroup and disable the OOM-killer.  Should be straightforward to add a listener when memory boundaries are crossed.

Food for thought,

Brian

Sent from my iPhone

On Aug 15, 2013, at 12:02 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:

> On 8/15/2013 5:10 AM, Chris Filo Gorgolewski wrote:
>> Yes that what I was afraid off - that this will cause swapping and slow
>> down the whole machine anyway.
> 
> Not sure how you conclude that swapping for a small subset of processes  (aka perhaps one slot on a machine with many slots) will slow down execution of other processes/slots that have their entire image in RAM.  Is virtual memory management i/o in the kernel synchronous? I doubt it...
> 
> If you are really concerned, you set your HTCondor startd PREEMPT expression to kill off jobs that are swapping after just a few seconds, i.e. jobs whose MemoryUsage > Memory.
> 
> With regards to the original topic of this thread, HTCondor v8.0.2 (scheduled for release next week) will include a patch to work around the kernel bug that can result in a reboot.  See
> https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3847
> 
> regards,
> Todd
> 
> 
>> 
>> On Fri, Aug 2, 2013 at 11:57 PM, Martin Bukatovic <
>> martin.bukatovic@xxxxxxxxx> wrote:
>> 
>>> Dear condor list,
>>> 
>>> On Thu, Aug 1, 2013 at 3:37 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx>
>>> wrote:
>>>> 
>>>> On Jul 23, 2013, at 7:58 AM, Chris Filo Gorgolewski <
>>> krzysztof.gorgolewski@xxxxxxxxx> wrote:
>>>> 
>>>>> On Thu, Jun 27, 2013 at 2:31 AM, Jason Ferrara <
>>> jason.ferrara@xxxxxxxxxxxxx> wrote:
>>>>> I have a pool of machines running CentOS 6.4, Kernel 2.6.32-358, and
>>> HTCondor 7.9.4.
>>>>> 
>>>>> Today, in order to try to stop jobs which underestimate their memory
>>> usage from making the machines swap a lot and get slow, I enabled cgroups
>>> and set
>>>>> 
>>>>> CGROUP_MEMORY_LIMIT_POLICY = soft
>>>>> RESERVED_MEMORY = 1024
>>>>> I have similar issue (jobs which underestimate their memory usage), but
>>> I wasn't sure that CGROUP will solve this. Do I understand correctly that
>>> setting CGROUP_MEMORY_LIMIT_POLICY to either "soft" or "hard" will disable
>>> swapping for all condor jobs?
>>>> 
>>>> - "hard" will kill the job as soon as it goes over its requested memory.
>>>> - "soft" will kill jobs that are over their requested memory only when
>>> the kernel believes memory is tight.
>>> 
>>> This is not a correct description.
>>> 
>>> First of all, it's worth noting that both hard and soft limits should
>>> not kill any process unless there is other good reason for OOM killer
>>> to intervene - such situation may be a running out of memory
>>> completely, but definitely not just going over soft or hard memory
>>> limit. And even when OOM killer is triggered, you can specify what it
>>> will mean for your process in particular cgroup via memory.oom_control
>>> file of memory cgroup controller.
>>> 
>>> Unfortunately there is an exception from this: a known bug in
>>> RHEL/CentOS kernel will kill processes of cgroup with small hard
>>> memory limit when the limit is breached. But this *is not* an correct
>>> behaviour. Kernel developers are working on the fix, see the bugzilla
>>> for details: https://bugzilla.redhat.com/show_bug.cgi?id=870011
>>> 
>>> Back to the CGROUP_MEMORY_LIMIT_POLICY,  see what upstream condor
>>> documentation states:
>>> 
>>> 
>>> http://research.cs.wisc.edu/htcondor/manual/v8.0/3_12Setting_Up.html#SECTION0041212000000000000000
>>> 
>>> <from condor docs>
>>> If the hard limit is in force, then the total amount of physical
>>> memory used by the sum of all processes in this job will not be
>>> allowed to exceed the limit. If the processes try to allocate more
>>> memory, the allocation will succeed, and virtual memory will be
>>> allocated, but no additional physical memory will be allocated. The
>>> system will keep the amount of physical memory constant by swapping
>>> some page from that job out of memory.
>>> 
>>> if the soft limit is in place, the job will be allowed to go over the
>>> limit if there is free memory available on the system. Only when there
>>> is contention between other processes for physical memory will the
>>> system force physical memory into swap and push the physical memory
>>> used towards the assigned limit.
>>>  </from condor docs>
>>> 
>>> Note that this description is consistent with kernel documentation for
>>> cgroups which was referenced in this thread before.
>>> 
>>> Martin B.
>>> 
>>>>> 
>>>>> Furthermore, the free memory used in the "soft" policy is calculated
>>> based on the current system state not taken from the RESERVED_MEMORY
>>> variable?
>>>> 
>>>> No -- soft-memory kills are controlled by the kernel.  From
>>> https://www.kernel.org/doc/Documentation/cgroups/memory.txt:
>>>> 
>>>> """
>>>> 7. Soft limits
>>>> 
>>>> Soft limits allow for greater sharing of memory. The idea behind soft
>>> limits
>>>> is to allow control groups to use as much of the memory as needed,
>>> provided
>>>> 
>>>> a. There is no memory contention
>>>> b. They do not exceed their hard limit
>>>> 
>>>> When the system detects memory contention or low memory, control groups
>>>> are pushed back to their soft limits. If the soft limit of each control
>>>> group is very high, they are pushed back as much as possible to make
>>>> sure that one control group does not starve the others of memory.
>>>> 
>>>> Please note that soft limits is a best-effort feature; it comes with
>>>> no guarantees, but it does its best to make sure that when memory is
>>>> heavily contended for, memory is allocated based on the soft limit
>>>> hints/setup. Currently soft limit based reclaim is set up such that
>>>> it gets invoked from balance_pgdat (kswapd).
>>>> """
>>>> 
>>>> I'm not a kernel programmer, but looking at the relevant kernel code, it
>>> seems that the cgroup is checked prior to swapping in most cases.
>>> 
>>>> The archives can be found at:
>>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with
>>> a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>> 
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>> 
>> 
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/