<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /></head><body style='font-size: 10pt; font-family: Verdana,Geneva,sans-serif'>

<div style="font-size: 10pt; font-family: Verdana,Geneva,sans-serif;">

<div style="font-size: 10pt; font-family: Verdana,Geneva,sans-serif;">

<p>Thank you for your advise, I just give a few precisions in a few lines  :</p>

<p>* for `-XX:+UseCompressedOops`, I must admit I do not know this option : I add it because JDK Mission control warned me about it in "Automated analysis result" after a fisrt try (<<Compressed Oops is turned off [...].Use the JVM argument '-XX:+UseCompressedOops' to enable this feature>>)</p>

<p>* it is true that application waste time in GC pauses (46,6% of time with G1) : I wanted an example app which uses GC a lot. Maybe this is a little too much compared to real apps (even if for some of them, we may wonder...).</p>

<p>* the stack I showed about finding a new empty page/region allocation is present in both cases (with jfr and without jfr). But in the case with jfr, it is much more wider : it takes much more samples.</p>

<p>Best regards,</p>

<p>Fabrice</p>

<p><br /></p>

<p id="v1v1reply-intro">Le 2026-01-12 14:18, Thomas Schatzl a écrit :</p>

<blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0">

<div class="v1v1pre">Hi,<br /><br />  while not being able to answer the question about why using JFR takes so much additional time, when reading about your benchmark setup the following things came to my mind:<br /><br />* -XX:+UseCompressedOops for ZGC does nothing (ZGC does not support compressed oops at all), and G1 will automatically use it. You can leave it off.<br /><br />* G1 having a significantly worse throughput than ZGC is very rare: even then the extent you show is quite large. Taking some of content together (4g heap, Maps, huge string variables) indicates that you might have run into a well-known pathology of G1 with large objects: the application might waste up to 50% of your application due to these humongous objects [<a href="https://tschatzl.github.io/2021/11/15/heap-regions-x-large.html" target="_blank" rel="noopener noreferrer">0</a>].<br />G1 might work better in JDK 26 too as some enhancement to some particular case has been added. More is being worked on.<br /><br />TL;DR: Your application might run much better with a large(r) G1HeapRegionSize setting. Or just upgrading to JDK 26.<br /><br />* While ZGC does not have that in some cases extreme memory wastage for large allocations, there is still some. Adding JFR might just push it over the edge (the stack you showed are about finding a new empty page/region for allocation, failing to do so, doing a GC, stalling and waiting).<br /><br />Hth,<br />  Thomas<br /><br />[0] <a href="https://tschatzl.github.io/2021/11/15/heap-regions-x-large.html" target="_blank" rel="noopener noreferrer">https://tschatzl.github.io/2021/11/15/heap-regions-x-large.html</a><br /><br />On 11.01.26 19:23, Fabrice Bibonne wrote:

<blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0">Hi all,<br /><br />  I would like to report a case where starting jfr for an application running with zgc causes a significant throughput degradation (compared to when JFR is not started).<br /><br />  My context : I was writing a little web app to illustrate a case where the use of ZGC gives a better throughput than with G1. I benchmarked with grafana k6 my application running with G1 and my application running with ZGC  : the runs with ZGC gave better throughputs. I wanted to go a bit further in explanation so I began again my benchmarks with JFR to be able to illustrate GC gains in JMC. When I ran my web app with ZGC+JFR, I noticed a significant throughput degradation in my benchmark (which was not the case with G1+JFR).<br /><br />  Although I did not measure an increase in overhead as such, I still wanted to report this issue because the degradation in throughput with JFR is such that it would not be usable as is on a production service.<br /><br />I wrote a little application (not a web one) to reproduce the problem : the application calls a little conversion service 200 times with random numbers in parallel (to be like a web app in charge and to pressure GC). The conversion service (a method named `convertNumberToWords`) convert the number in a String looking for the String in a Map with the number as th key. In order to instantiate and destroy many objects at each call, the map is built parsing a huge String at each call. Application ends after 200 calls.<br /><br />Here are the step to reproduce :<br />1. Clone <a href="https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact" target="_blank" rel="noopener noreferrer">https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact</a> (be aware to be on branch jfr+zgc_impact)<br />2. Compile it (you must include numbers200k.zip in resources : it contains a 36 Mo text files whose contents are used to create the huge String variable)<br />3. in the root of repository :<br />3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -classpath target/classes poc.java.perf.write.TestPerf #ZGC without JFR`<br />3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops - XX:StartFlightRecording -classpath target/classes poc.java.perf.write.TestPerf #ZGC with JFR`<br />4. The real time of the second run (with JFR) will be considerably higher than that of the first<br /><br />I ran these tests on my laptop :<br />- Dell Inc. Latitude 5591<br />- openSUSE Tumbleweed 20260108<br />- Kernel : 6.18.3-1-default (64-bit)<br />- 12 × Intel® Core™ i7-8850H CPU @ 2.60GHz<br />- RAM 16 Gio<br />- openjdk version "25.0.1" 2025-10-21<br />- OpenJDK Runtime Environment (build 25.0.1+8-27)<br />- OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing)<br />- many tabs opened in firefox !<br /><br />I also ran it in a container (eclipse-temurin:25) on my laptop and with a windows laptop and came to the same conclusions : here are the measurements from the container :<br /><br />| Run with  | Real time (s) |<br />|-----------|---------------|<br />| ZGC alone | 7.473         |<br />| ZGC + jfr | 25.075        |<br />| G1 alone  | 10.195        |<br />| G1 + jfr  | 10.450        |<br /><br /><br />After all these tests I tried to run the app with an other profiler tool in order to understand where is the issue. I join the flamegraph when running jfr+zgc : for the worker threads of the ForkJoinPool of Stream, stack traces of a majority of samples have the same top lines :<br />- PosixSemaphore::wait<br />- ZPageAllocator::alloc_page_stall<br />- ZPageAllocator::alloc_page_inner<br />- ZPageAllocator::alloc_page<br /><br />So many thread seem to spent their time waiting in the method ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic tasks threads has also a few samples where it waits at ZPageAllocator::alloc_page_stall. I hope this will help you to find the issue.<br /><br />Thank you very much for reading this email until the end. I hope this is the good place for such a feedback. Let me know if I must report my problem elsewhere. Be free to ask me more questions if you need.<br /><br />Thank you all for this amazing tool !<br /><br /><br /></blockquote>

</div>

</blockquote>

</div>

</div>

</body></html>