<div dir="ltr"><div>Disclaimer: I am not a G1 expert, just a curious bystander.</div><div><br></div><div>Just some questions:</div><div>- what is the standard deviation on those tests; meaning, how reliable are the results?</div><div>- you are sure that THP is set to madvise mode on the OS side, right? Not always?</div><div>- can you exclude the (as you say, large) off-heap memory as a cause for the delta? So the only THP memory in use is the Java heap? Is anyone madvising those offheap memory regions, third-party native code maybe?</div><div><br></div><div>Note, you can use jcmd System.map to get a look at the THP state of all memory regions of the process. It shows you which areas are eliglbe for THP coalescation, and which are already coalesced. That includes heap and your offheap memory (if you happen to know its address). </div><div><br></div><div>Other than that, my first guess would be that any performance regression with THP stems from THP repeatedly forming and shattering. E.g. the OS coalesces large pages concurrently to the JVM uncommitting small-page-sized-portions of just coalesced large THP pages.</div><div></div><div><br></div><div><br></div></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Thu, Aug 28, 2025 at 4:42 PM Brian S O'Neill <<a href="mailto:bronee@gmail.com">bronee@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I'm experimenting with the changes in PR 23739 (8342382: Implement JEP <br>

522: G1 GC: Improve Throughput by Reducing Synchronization) and I'm <br>

seeing a small performance regression when large pages are configured.<br>

<br>

The test is fairly complicated, and most of the memory it uses is off <br>

heap. The GC heap size is set to 3GB (min and max), which is much larger <br>

than is actually required. A bunch of objects are allocated up front and <br>

remain in the old gen for the duration of the test run. Between each GC <br>

cycle, almost all the old gen objects will have been updated to <br>

reference a young object. The young gen objects live for about 2 <br>

microseconds, and the references from the old gen objects are cleared.<br>

<br>

Here's the baseline results when running with "normal" pages:<br>

<br>

ParallelGC:   235.6 seconds<br>

G1GC JEP 522: 238.7 seconds<br>

G1GC:         241.5 seconds<br>

ZGC:          246.2 seconds<br>

<br>

With JEP 522, there's a small performance improvement, about 1%, which <br>

is nice to see. Here's the results when running with large pages <br>

(-XX:+UseLargePages -XX:+UseTransparentHugePages shmem_enabled is advise):<br>

<br>

ParallelGC:   228.9 seconds<br>

G1GC:         235.1 seconds<br>

ZGC:          239.3 seconds<br>

G1GC JEP 522: 239.7 seconds<br>

<br>

All of the GCs show a performance improvement when using large pages, <br>

but with JEP 522, G1 is slower than the current version (JDK 24).<br>

<br>

I don't know why there's a performance regression. Is this to be <br>

expected with large pages, or is there a missing configuration <br>

somewhere? I'm not configuring anything other than -Xms, -Xmx, and the <br>

large page settings. Also note that the test is run ten times (without <br>

restarting the JVM) and the average time is reported.<br>

<br>

</blockquote></div>