[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] htcondor cgroups and memory limits on CentOS7



Hi Andrew,

right, I've just found the problem. The puppet template set
CGROUP_MEMORY_LIMIT = soft
instead of
CGROUP_MEMORY_LIMIT_POLICY = soft
It now sets some limits but the soft memory looks like a default (same number if used in the templates to limit the log files).

cat memory.soft_limit_in_bytes memory.memsw.limit_in_bytes
104857600
135721500672


said that it now at least kills when the memory exceeds greatly the limit

condor_run 'hostname; stress -m 2Â --vm-bytes 3G -c 1 -t 600s'
The job was aborted by the user

I will keep on looking. It was an interesting thread, even though the problem was as usual more trivial than expected. ð

cheers
alessandra

On 24/10/2017 11:03, andrew.lahiff@xxxxxxxxxx wrote:
Hi Alessandra,

Sorry, let me try that again.

For a job requesting 1 GB memory and "CGROUP_MEMORY_LIMIT_POLICY = soft", and using partitionable slots, I get:

[root@vm132 condor_var_lib_condor_execute_slot1_1@xxxxxxxxxxxxxxxxxxxxxx]# cat memory.soft_limit_in_bytes
1073741824
[root@vm132 condor_var_lib_condor_execute_slot1_1@xxxxxxxxxxxxxxxxxxxxxx]# cat memory.limit_in_bytes
4512206848
[root@vm132 condor_var_lib_condor_execute_slot1_1@xxxxxxxxxxxxxxxxxxxxxx]# cat memory.memsw.limit_in_bytes
4512210944

This is on a test VM with around ~4 GB memory and ~500 MB swap. The soft limit looks sensible to me and is consistent with what the job requested.

With "CGROUP_MEMORY_LIMIT_POLICY = hard" I get:

[root@vm132 condor_var_lib_condor_execute_slot1_1@xxxxxxxxxxxxxxxxxxxxxx]# cat memory.soft_limit_in_bytes
0
[root@vm132 condor_var_lib_condor_execute_slot1_1@xxxxxxxxxxxxxxxxxxxxxx]# cat memory.limit_in_bytes
1073741824
[root@vm132 condor_var_lib_condor_execute_slot1_1@xxxxxxxxxxxxxxxxxxxxxx]# cat memory.memsw.limit_in_bytes
4512210944

which has the hard limit as expected.

Regards,
Andrew.


________________________________________
From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Alessandra Forti [Alessandra.Forti@xxxxxxx]
Sent: Tuesday, October 24, 2017 10:18 AM
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] htcondor cgroups and memory limits on CentOS7

Hi Andrew,

On 24/10/2017 10:07, andrew.lahiff@xxxxxxxxxx wrote:
Hi Alessandra,

There seems to have been a change in behavior with respect to how HTCondor configures cgroups. With older versions of HTCondor, it used to set memory.soft_limit_in_bytes when using soft memory limits (at least this is what I remember).

However, now (e.g. in 8.6.6) memory.soft_limit_in_bytes seems to be set to the total memory of the machine, and memory.memsw.limit_in_bytes is set at memory that the job requested.
not really. It isn't setting any of these limits. Not even the machine
memory on my nodes.

pwd; cat memory.soft_limit_in_bytes memory.memsw.limit_in_bytes
/sys/fs/cgroup/memory/system.slice/condor.service/condor_scratch_condor_pool_condor_slot1_10@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
9223372036854771712
9223372036854771712



  We use the Docker universe now so in our case it's Docker that's creating the cgroups.

Regards,
Andrew.

________________________________
From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Alessandra Forti [Alessandra.Forti@xxxxxxx]
Sent: Tuesday, October 24, 2017 9:39 AM
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] htcondor cgroups and memory limits on CentOS7

Hi Thomas,


On 24/10/2017 09:17, Thomas Hartmann wrote:

Hi Todd, (sorry to fork in between)

I am a bit confused regarding the soft limits.

So far I had assumed that the kernel would allow a cgroup to exceed its
soft limit usage as long as there is free memory available

do you set the limit or your htcondor does? because my htcondor doesn't set that limit. Maybe I'm doing something wrong.

- and kill a
group's processes if the system runs low on unwired memory (assuming a
translation between limits in condor to cgroup limits).


So, we have effectively not set a 'real' cgroup hard limit assuming that
the soft limit would be sufficient, e.g., would the kernel kill [1] when
exceeding it's 4GB soft limit and running low on system-wide memory?

no the kernel doesn't kill with the soft limit. This is why system periodic remove is needed.

(looking now onto the values: would memsw -set to such a large value-
actually send the job heavily swapping...?)


infact memsw is the place where RAM+swap is limited. However as pointed out in the thread you may end up with a job which has 0 memory and 4GB of swap.


Cheers,
   Thomas



[1]
/sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_6@xxxxxxxxxxxxxxxxx/memory.limit_in_bytes
142668537856
/sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_6@xxxxxxxxxxxxxxxxx/memory.memsw.limit_in_bytes
142668541952
/sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_6@xxxxxxxxxxxxxxxxx/memory.soft_limit_in_bytes
4294967296


On 2017-10-20 18:26, Todd Tannenbaum wrote:


On 10/20/2017 9:44 AM, Alessandra Forti wrote:


Hi,

is more information needed?



Hi Alessandra,

The version of HTCondor you are using would be helpful :).

But I have some answers/suggestions below that I hope will help...



* On the head node

RemoveMemoryUsage = ( ResidentSetSize_RAW > 2000*RequestMemory )
SYSTEM_PERIODIC_REMOVE = $(RemoveMemoryUsage)  || <OtherParameters>

So the questions are two

1) Why SYSTEM_PERIODIC_REMOVE  didn't work?


Because the (system_)periodic_remove expressions are evaluated by the
condor_shadow while the job is running, and the *_RAW attributes are
only updated in the condor_schedd.

A simple solution is to use attribute MemoryUsage instead of
ResidentSetSize_RAW.  So I think things will work as you want if you
instead did:

   RemoveMemoryUsage = ( MemoryUsage > 2*RequestMemory )
   SYSTEM_PERIODIC_REMOVE = $(RemoveMemoryUsage)  || <OtherParameters>

Note that MemoryUsage is in the same units as RequestMemory, so only
need to multiply by 2 instead of 2000.

You are not the first person to be tripped up by this. :(  I realize it
is not at all intuitive. I think I will add a quick patch in the code to
allow _RAW attributes to be referenced inside of job policy expressions
to help prevent frustration by the next person.

Also you may want to place your memory limit policy on the execute nodes
via startd policy _expression_, instead of having them enforced on the
submit machine (what I think you are calling the head node).  The reason
is the execute node policy is evaluated every five seconds, while the
submit machine policy is evaluated every several minutes.  A runaway job
could consume a lot of memory in a few minutes :).



2) Shouldn't htcondor set the job soft limit with this configuration?
or is the site expected to set the soft limit separately?



Personally, I think "soft" limits in cgroups are completely bogus.  The
way the Linux kernel treats soft limits does not do in practice what
anyone (including htcondor itself) expects.  I recommend settings
CGROUP_MEMORY_LIMIT to either none or hard, soft makes no sense imho.

"CGROUP_MEMORY_LIMIT=hard" is clear to understand: if the job uses more
memory than it requested, it is __immediately__ kicked off and put on
hold.  This way users get a consistent experience.

If you want jobs to be able to go over their requested memory so long as
the machine isn't swapping, consider disabling swap on your execute
nodes (not a bad idea for compute servers in general) and simply leaving
"CGROUP_MEMORY_LIMIT=none".  What will happen is if the system is
stressed, eventually the Linux OOM (out of memory killer) will kick in
and pick a process to kill.  HTCondor sets the OOM priority of job
process such that the OOM killer should always pick job processes ahead
of other processes on the system.  Furthermore, HTCondor "captures" the
OOM request to kill a job and only allows it to continue if the job is
indeed using more memory than requested (i.e. provisioned in the slot).
This is probably what you wanted by setting the limit to soft in the
first place.

I am thinking we should remove the "soft" option to CGROUP_MEMORY_LIMIT
in future releases, it just causes confusion imho.  Curious if others on
the list disagree...

Hope the above helps,
regards,
Todd

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx<mailto:htcondor-users-request@xxxxxxxxxxx> with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/




_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx<mailto:htcondor-users-request@xxxxxxxxxxx> with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


--
Respect is a rational process. \\//
Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante)
For Ur-Fascism, disagreement is treason. (U. Eco)
But but but her emails... covfefe!
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
--
Respect is a rational process. \\//
Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante)
For Ur-Fascism, disagreement is treason. (U. Eco)
But but but her emails... covfefe!

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

-- 
Respect is a rational process. \\//
Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante)
For Ur-Fascism, disagreement is treason. (U. Eco)
But but but her emails... covfefe!