RFR: JDK-8293114: JVM should trim the native heap
Thomas Stuefe
stuefe at openjdk.org
Thu Jul 6 10:07:25 UTC 2023
This is a continuation of https://github.com/openjdk/jdk/pull/10085. I closed https://github.com/openjdk/jdk/pull/10085 because it had accumulated too much comment history and got confusing. For a history of this issue, see previous discussions [1] and the comment section of 10085.
---------------
This RFE adds the option to trim the Glibc heap periodically. This can recover a significant memory footprint if the VM process suffers from high-but-rare malloc spikes. It does not matter who causes the spikes: the JDK or customer code running in the JVM process.
### Background:
The Glibc is reluctant to return memory to the OS. Temporary malloc spikes often carry over as permanent RSS increase. Note that C-heap retention is difficult to observe. Since it is freed memory, it won't appear in NMT; it is just a part of RSS.
This is, effectively, caching - a performance tradeoff by the glibc. It makes a lot of sense with applications that cause high traffic on the C-heap. The JVM, however, clusters allocations and often rolls its own memory management based on virtual memory for many of its use cases.
To manually trim the C-heap, Glibc exposes `malloc_trim(3)`. With JDK 18 [2], we added a new jcmd command to *manually* trim the C-heap on Linux (`jcmd System.trim_native_heap`). We then observed customers running this command periodically to slim down process sizes of container-bound jvms. That is cumbersome, and the JVM can do this a lot better - among other things because it knows best when *not* to trim.
#### GLIBC internals
The following information I took from the glibc source code and experimenting.
##### Why do we need to trim manually? Does the Glibc not trim on free?
Upon `free()`, glibc may return memory to the OS if:
- the returned block was mmap'ed
- the returned block was not added to tcache or to fastbins
- the returned block, possibly merged with its two immediate neighbors, had they been free, is larger than FASTBIN_CONSOLIDATION_THRESHOLD (64K) - in that case:
a) for the main arena, glibc attempts to lower the brk()
b) for mmap-ed heaps, glibc attempts to completely unmap or shrink the heap.
In both cases, (a) and (b), only the top portion of the heap is reclaimed. "Holes" in the middle of other in-use chunks are not reclaimed.
So: glibc *may* automatically reclaim memory. In normal configurations, with typical C-heap allocation granularity, it is unlikely.
To increase the chance of auto-reclamation happening, one can do one or more things:
- a) increase allocation sizes
- b) limit mallocs to very few threads, ideally just one
- c) set MALLOC_ARENA_MAX=1
- d) set the `glibc.malloc.trim_threshold` to a very low value, e.g., 1
But:
- (a) is not possible for foreign code; not that even within hotspot, where we cluster allocations using hotspot arenas, the typical arena chunk size is too small to be auto-reclaimed
- (b) may just not be feasible
- (c) is *terrible* for performance since many C-Heap operations will compete over the lock of that single arena
- (d) works if either (b) or (c) are in place and if the released block happens to border the top of the arena. And it also costs performance.
The JVM only has limited influence on (a), none on (b), (c) is a really bad idea, and hence (d) often does little. That mirrors my practical experiences.
##### How does `malloc_trim()` differ from trimming on free() ?
`malloc_trim()`, will look for holes that are larger than a page; so it limits itself not to just reclaiming memory at the top of the arena. It will then `madvise(MADV_DONTNEED)` those holes. It does that for every arena.
##### What are the cons of calling `malloc_trim()`?
`malloc_trim()` cannot be interrupted. Once it runs, it runs. The runtime of `malloc_trim()` is not predictable. If there is nothing to reclaim, it is very fast (sub-ms). If there is a lot to reclaim (e.g. >32GB), I saw times of up to 800ms.
Moreover, `malloc_trim`, while trimming each arena, locks the arena. That may lock out concurrent C-heap operations in the thread that uses this arena. Note, however, that this is rare since many operations will be satisfied from the tcache and therefore don't lock.
##### What about the `pad` parameter for `malloc_trim()`
I found it has very little effect. It only affects how many bytes are preserved at the top of the main arena. It does not affect other arenas, nor does it affect how much space malloc_trim reclaims by releasing "holes", which is the main part of memory release.
### The Patch
Patch adds new options (experimental):
-XX:+GCTrimNativeHeap
-XX:GCTrimNativeHeapInterval=<seconds> (defaults to 60)
`GCTrimNativeHeap` is off by default. If enabled, it will cause the VM to trim the native heap periodically. The period is defined by `GCTrimNativeHeapInterval`.
Periodic trimming is done in its own thread. We cannot burden the ServiceThread, since the runtime of trims is unpredictable.
The patch also adds a way to suspend trimming temporarily; if suspended, no trims will start, but ongoing trims will still finish.
The patch uses this mechanism to suspend trimming during GC STW phases and whenever we are about to do bulk C-heap operations (e.g. deleting deflated monitors).
### Examples:
This is an artificial test that causes two high malloc spikes with long idle periods.
(yellow) NMT shows two spikes for malloc'ed memory;
(red) RSS of the baseline JVM shows that we reach a maximum and then never recover. This is the glibc retaining the free'd memory.
(blue) RSS of the patched JVM shows that we recover RSS in steps by doing periodic C-heap trimming.
![alloc-test](https://raw.githubusercontent.com/tstuefe/autotrim-experiments/master/alloc-c-heap-repro/basline-vs-autotrim30s.png)
(See here for parameters: [run script](http://cr.openjdk.java.net/~stuefe/other/autotrim/run-all.sh) )
### Tests
Tested older Glibc (2.31), and newer Glibc (2.35) (`mallinfo()` vs` mallinfo2()`), on Linux x64.
Older versions of this patch were routinely tested at SAP for almost half a year.
- [1] https://mail.openjdk.org/pipermail/hotspot-dev/2021-August/054323.html
- [2] https://bugs.openjdk.org/browse/JDK-8269345
-------------
Commit messages:
- Fix test for non-glibc platforms
- Initial implementation
Changes: https://git.openjdk.org/jdk/pull/14781/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14781&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8293114
Stats: 643 lines in 20 files changed: 637 ins; 1 del; 5 mod
Patch: https://git.openjdk.org/jdk/pull/14781.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/14781/head:pull/14781
PR: https://git.openjdk.org/jdk/pull/14781
More information about the serviceability-dev
mailing list