[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] ResidentSetSize
- Date: Tue, 10 May 2016 16:24:13 -0400
- From: Bob Ball <ball@xxxxxxxxx>
- Subject: Re: [HTCondor-users] ResidentSetSize
Thanks, Todd, I will look at this. Also see below.
On 5/10/2016 4:01 PM, Todd Tannenbaum wrote:
On 5/10/2016 2:38 PM, Bob Ball wrote:
It would appear that the ResidentSetSize value does not immediately
appear in the Job ClassAd as a job starts, but only some short time
later. OK, I can live with that.
However, what is harder to understand is that we have seen instances
where it will display a value of 27GB when a job starts! I was trying
to use this on the schedd machine
SYSTEM_PERIODIC_REMOVE = ResidentSetSize > 5000*RequestMemory
but that 27GB cut was killing the jobs. However, once I removed this
from the schedd, similar jobs finished fine, with ClassAds reporting
only about 2.7GB of final ResidentSetSize.
Has anyone seen this kind of thing before? How could this be real, and
why does the ResidentSetSize not appear in the job ClassAd at job start?
We are running HTCondor 8.4.6 with cgroups enabled.
Couple quick thoughts:
1. We've seen Linux cgroups report large memory sizes for jobs because
it is including memory used by the kernel to buffer file system
writes. If you have condor_config knob ENABLE_KERNEL_TUNING set to
True (the default), HTCondor should prevent this problem from
happening starting with v8.4.5. But if you disabled this for whatever
reason, that could be the culprit. For more info see
This is enabled and active.
2. Suggest using MemoryUsage for all policy expressions instead of
MemoryUsage = ( ( ResidentSetSize + 1023 ) / 1024 )
How will this help in this expression, other than just rounding 27GB a
bit? What happens with this expression at first, when ResidentSetSize
is not yet available?
3. As to why does ResidentSetSize not appear in the job classad at job
start : Immediately at job start (i.e. right after the job is execed),
many jobs haven't allocated much memory yet, because they are still
initializing. So now the question is how long to wait... 1 second? 1
minute? By default HTCondor waits 8 seconds after job startup before
updating ResidentSetSize and other attributes. There is a
condor_starter config knob for this named
As noted, I can live with a delay.
I will check this out. I _think_ I've seen it before, but will have to
take a second look.
4. You may be interested in the HOWTO at
hope the above helps
Any information is helpful.
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
You can also unsubscribe by visiting
The archives can be found at: