RFR: JDK-8293114: JVM should trim the native heap

Thomas Stuefe stuefe at openjdk.org
Thu Jul 6 15:29:56 UTC 2023


On Thu, 6 Jul 2023 09:20:25 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:

>> This is a continuation of https://github.com/openjdk/jdk/pull/10085. I closed https://github.com/openjdk/jdk/pull/10085 because it had accumulated too much comment history and got confusing. For a history of this issue, see previous discussions [1] and the comment section of 10085.
>>  
>> ---------------
>> 
>> This RFE adds the option to trim the Glibc heap periodically. This can recover a significant memory footprint if the VM process suffers from high-but-rare malloc spikes. It does not matter who causes the spikes: the JDK or customer code running in the JVM process.
>> 
>> ### Background:
>> 
>> The Glibc is reluctant to return memory to the OS. Temporary malloc spikes often carry over as permanent RSS increase. Note that C-heap retention is difficult to observe. Since it is freed memory, it won't appear in NMT; it is just a part of RSS.
>> 
>> This is, effectively, caching - a performance tradeoff by the glibc. It makes a lot of sense with applications that cause high traffic on the C-heap. The JVM, however, clusters allocations and often rolls its own memory management based on virtual memory for many of its use cases.
>> 
>> To manually trim the C-heap, Glibc exposes `malloc_trim(3)`. With JDK 18 [2], we added a new jcmd command to *manually* trim the C-heap on Linux (`jcmd System.trim_native_heap`). We then observed customers running this command periodically to slim down process sizes of container-bound jvms. That is cumbersome, and the JVM can do this a lot better - among other things because it knows best when *not* to trim.
>> 
>> #### GLIBC internals
>> 
>> The following information I took from the glibc source code and experimenting.
>> 
>> ##### Why do we need to trim manually? Does the Glibc not trim on free?
>> 
>> Upon `free()`, glibc may return memory to the OS if:
>> - the returned block was mmap'ed
>> - the returned block was not added to tcache or to fastbins
>> - the returned block, possibly merged with its two immediate neighbors, had they been free, is larger than FASTBIN_CONSOLIDATION_THRESHOLD (64K) - in that case:
>>   a) for the main arena, glibc attempts to lower the brk()
>>   b) for mmap-ed heaps, glibc attempts to completely unmap or shrink the heap.
>> In both cases, (a) and (b), only the top portion of the heap is reclaimed. "Holes" in the middle of other in-use chunks are not reclaimed.
>> 
>> So: glibc *may* automatically reclaim memory. In normal configurations, with typical C-heap allocation granularity, it is unlikely.
>> 
>> To increase the ...
>
>>And app's malloc load can fluctuate wildly, with temporary spikes and long idle periods.
> 
> Are you talking about allocations into native memory that a Java application does on its own accord and not as a consequence of the JVM doing its own allocs? For compiling, for example.

@jdksjolen @shipilev New version:
- I removed all manual pause calls from all GC code and replaced those with just not trimming when at or near a safe point. This is less invasive since I expect the typical trim interval to be much larger than the interval we do our VM operations in.
- The pauses in runtime code remain
- I restored the original arena coding, I just added the pause. Though this coding could greatly benefit from cleanups, I want to keep this patch focused and easy to downport.
- Since we no longer have close ties to GC coding, the tests don't need to run per gc, so I dumbed down the tests.

@shipilev : I kept the SuspendMark inside TrimNative, because I like it that way. Otherwise, I would name it something like TrimNativeSuspendMark, so nothing gained but another symbol at global scope.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14781#issuecomment-1623878277


More information about the serviceability-dev mailing list