I've setup cgroups on my htcondor cluster some months ago. I expected cgroups to handle soft limits and htcondor to kill with SYTEM_PERIODIC_REMOVE when the limit is twice the requested memory. However last week we had a user running havoc on the nodes and using up to 35GB of RSS when his limit should have been 4GB.

My settings are as follows

* On the WNs

# Enable CGROUP
BASE_CGROUP = /system.slice/condor.service

* On the head node

RemoveMemoryUsage = ( ResidentSetSize_RAW > 2000*RequestMemory )
SYSTEM_PERIODIC_REMOVE = $(RemoveMemoryUsage)Â ||Â <OtherParameters>

this is a set up other sites have.

cgroup doesn't have any limit set neither soft nor hard.

So the questions are two

1) Why SYSTEM_PERIODIC_REMOVEÂ didn't work? Here is an example of job that exceeded the limit 4GB limit

condor_history 66469.0 -autoformat ClusterId 2000*RequestMemory ResidentSetSize_RAW
66469 4000000 34723028

2) Shouldn't htcondor set the job soft limit with this configuration? or is the site expected to set the soft limit separately?



