[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] ResidentSetSize

On 5/10/2016 2:38 PM, Bob Ball wrote:
It would appear that the ResidentSetSize value does not immediately
appear in the Job ClassAd as a job starts, but only some short time
later.  OK, I can live with that.

However, what is harder to understand is that we have seen instances
where it will display a value of 27GB when a job starts!  I was trying
to use this on the schedd machine
    SYSTEM_PERIODIC_REMOVE = ResidentSetSize > 5000*RequestMemory
but that 27GB cut was killing the jobs.  However, once I removed this
from the schedd, similar jobs finished fine, with ClassAds reporting
only about 2.7GB of final ResidentSetSize.

Has anyone seen this kind of thing before?  How could this be real, and
why does the ResidentSetSize not appear in the job ClassAd at job start?

We are running HTCondor 8.4.6 with cgroups enabled.

Hi Bob,

Couple quick thoughts:

1. We've seen Linux cgroups report large memory sizes for jobs because it is including memory used by the kernel to buffer file system writes. If you have condor_config knob ENABLE_KERNEL_TUNING set to True (the default), HTCondor should prevent this problem from happening starting with v8.4.5. But if you disabled this for whatever reason, that could be the culprit. For more info see

2. Suggest using MemoryUsage for all policy expressions instead of ResidentSetSize.

3. As to why does ResidentSetSize not appear in the job classad at job start : Immediately at job start (i.e. right after the job is execed), many jobs haven't allocated much memory yet, because they are still initializing. So now the question is how long to wait... 1 second? 1 minute? By default HTCondor waits 8 seconds after job startup before updating ResidentSetSize and other attributes. There is a condor_starter config knob for this named STARTER_INITIAL_UPDATE_INTERVAL.

4. You may be interested in the HOWTO at

hope the above helps