JEP 522 performance regression with large pages

Thu Aug 28 15:20:03 UTC 2025

Disclaimer: I am not a G1 expert, just a curious bystander.

Just some questions:
- what is the standard deviation on those tests; meaning, how reliable are
the results?
- you are sure that THP is set to madvise mode on the OS side, right? Not
always?
- can you exclude the (as you say, large) off-heap memory as a cause for
the delta? So the only THP memory in use is the Java heap? Is anyone
madvising those offheap memory regions, third-party native code maybe?

Note, you can use jcmd System.map to get a look at the THP state of all
memory regions of the process. It shows you which areas are eliglbe for THP
coalescation, and which are already coalesced. That includes heap and your
offheap memory (if you happen to know its address).

Other than that, my first guess would be that any performance regression
with THP stems from THP repeatedly forming and shattering. E.g. the OS
coalesces large pages concurrently to the JVM uncommitting
small-page-sized-portions of just coalesced large THP pages.

On Thu, Aug 28, 2025 at 4:42 PM Brian S O'Neill <bronee at gmail.com> wrote:

> I'm experimenting with the changes in PR 23739 (8342382: Implement JEP
> 522: G1 GC: Improve Throughput by Reducing Synchronization) and I'm
> seeing a small performance regression when large pages are configured.
>
> The test is fairly complicated, and most of the memory it uses is off
> heap. The GC heap size is set to 3GB (min and max), which is much larger
> than is actually required. A bunch of objects are allocated up front and
> remain in the old gen for the duration of the test run. Between each GC
> cycle, almost all the old gen objects will have been updated to
> reference a young object. The young gen objects live for about 2
> microseconds, and the references from the old gen objects are cleared.
>
> Here's the baseline results when running with "normal" pages:
>
> ParallelGC:   235.6 seconds
> G1GC JEP 522: 238.7 seconds
> G1GC:         241.5 seconds
> ZGC:          246.2 seconds
>
> With JEP 522, there's a small performance improvement, about 1%, which
> is nice to see. Here's the results when running with large pages
> (-XX:+UseLargePages -XX:+UseTransparentHugePages shmem_enabled is advise):
>
> ParallelGC:   228.9 seconds
> G1GC:         235.1 seconds
> ZGC:          239.3 seconds
> G1GC JEP 522: 239.7 seconds
>
> All of the GCs show a performance improvement when using large pages,
> but with JEP 522, G1 is slower than the current version (JDK 24).
>
> I don't know why there's a performance regression. Is this to be
> expected with large pages, or is there a missing configuration
> somewhere? I'm not configuring anything other than -Xms, -Xmx, and the
> large page settings. Also note that the test is run ten times (without
> restarting the JVM) and the average time is reported.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20250828/58f6f4ea/attachment-0001.htm>