RFR: 8306841: Generational ZGC: NMT reports Java heap size larger than max heap size
Thomas Stuefe
stuefe at openjdk.org
Wed Jun 7 14:23:59 UTC 2023
On Wed, 7 Jun 2023 13:30:05 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:
> ZGC has separated the committing of physical memory from the mapping of the committed memory to virtual memory. It also has asynchronous, lazy unmapping of virtual memory from physical memory. This leads to a situation where multiple virtual memory areas can be mapped to the same physical memory. NMT has a strong assumption that there's a 1-to-1 correspondence between committed memory and its virtual memory areas. Because of this NMT and ZGC is not entirely compatible. ZGC has worked around this by adding NMT hooks where the virtual memory is mapped to the committed memory. This mostly works, but there are situations where we have multiple virtual memory areas mapped to the same physical memory, and that causes the NMT values to be inflated.
>
> I propose that we move the NMT committed memory tracking from the mapping of virtual memory to the actual committing of physical memory.
>
> FWIW, given that NMT and ZGC doesn't agree about how memory is committed, we have to fake the virtual memory addresses reported to NMT. This could probably be noticed if you look for the Java heap addresses in the NMT details output, but I don't see why anyone should be looking for those address for the Java heap in NMT. The interesting number is the amount of committed memory, not the exact addresses, IMHO. This isn't something that we change with this patch, but it can be worth understanding while looking at this Bug and the associated PR.
>
> I've written a small sanity test for the NMT Java Heap values, however it's non-trivial to write a test that efficiently provokes this. I've verified this fix by manually running an over-provisioned SPECjbb2015 run, which results in a lot of splitting of ZGC heap regions, which in turn gives us multiple virtual memory area mapping for the same physical memory.
>
> Side note: the lazy unmapping of virtual memory can cause other problems with too many virtual memory areas. The inflated NMT numbers have been a smoking gun showing us that issue. We are tracking that issue with [JDK-8308783](https://bugs.openjdk.org/browse/JDK-8308783).
Looks good.
Question: why is this limited to generational ZGC? Just a decision not to fix old ZGC, or does it not happen with old ZGC?
> FWIW, given that NMT and ZGC doesn't agree about how memory is committed, we have to fake the virtual memory addresses reported to NMT. This could probably be noticed if you look for the Java heap addresses in the NMT details output, but I don't see why anyone should be looking for those address for the Java heap in NMT.
We do, but it is not such an important use case: in hs_err file "unknown pointer" printing, I use NMT to make sense of an otherwise unknown address.
src/hotspot/share/gc/z/zPhysicalMemory.cpp line 285:
> 283: // When this function is called we don't know where in the virtual memory
> 284: // this physical memory will be mapped. So we fake that the virtual memory
> 285: // address is the heap base + the given offset.
Question of a casual ZGC source reader: when you talk about physical vs virtual here, you are not talking about the real physical vs virtual, right? You are talking about offsets into the ZGC backing file vs attach points of said offsets in the virtual address space?
-------------
Marked as reviewed by stuefe (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/14355#pullrequestreview-1467796287
PR Review Comment: https://git.openjdk.org/jdk/pull/14355#discussion_r1221676342
More information about the hotspot-dev
mailing list