Using JFR both with ZGC degrades application throughput
Hi all, I would like to report a case where starting jfr for an application running with zgc causes a significant throughput degradation (compared to when JFR is not started). My context : I was writing a little web app to illustrate a case where the use of ZGC gives a better throughput than with G1. I benchmarked with grafana k6 my application running with G1 and my application running with ZGC : the runs with ZGC gave better throughputs. I wanted to go a bit further in explanation so I began again my benchmarks with JFR to be able to illustrate GC gains in JMC. When I ran my web app with ZGC+JFR, I noticed a significant throughput degradation in my benchmark (which was not the case with G1+JFR). Although I did not measure an increase in overhead as such, I still wanted to report this issue because the degradation in throughput with JFR is such that it would not be usable as is on a production service. I wrote a little application (not a web one) to reproduce the problem : the application calls a little conversion service 200 times with random numbers in parallel (to be like a web app in charge and to pressure GC). The conversion service (a method named `convertNumberToWords`) convert the number in a String looking for the String in a Map with the number as th key. In order to instantiate and destroy many objects at each call, the map is built parsing a huge String at each call. Application ends after 200 calls. Here are the step to reproduce : 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact (be aware to be on branch jfr+zgc_impact) 2. Compile it (you must include numbers200k.zip in resources : it contains a 36 Mo text files whose contents are used to create the huge String variable) 3. in the root of repository : 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -classpath target/classes poc.java.perf.write.TestPerf #ZGC without JFR` 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -XX:StartFlightRecording -classpath target/classes poc.java.perf.write.TestPerf #ZGC with JFR` 4. The real time of the second run (with JFR) will be considerably higher than that of the first I ran these tests on my laptop : - Dell Inc. Latitude 5591 - openSUSE Tumbleweed 20260108 - Kernel : 6.18.3-1-default (64-bit) - 12 × Intel(R) Core(tm) i7-8850H CPU @ 2.60GHz - RAM 16 Gio - openjdk version "25.0.1" 2025-10-21 - OpenJDK Runtime Environment (build 25.0.1+8-27) - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) - many tabs opened in firefox ! I also ran it in a container (eclipse-temurin:25) on my laptop and with a windows laptop and came to the same conclusions : here are the measurements from the container : | Run with | Real time (s) | |-----------|---------------| | ZGC alone | 7.473 | | ZGC + jfr | 25.075 | | G1 alone | 10.195 | | G1 + jfr | 10.450 | After all these tests I tried to run the app with an other profiler tool in order to understand where is the issue. I join the flamegraph when running jfr+zgc : for the worker threads of the ForkJoinPool of Stream, stack traces of a majority of samples have the same top lines : - PosixSemaphore::wait - ZPageAllocator::alloc_page_stall - ZPageAllocator::alloc_page_inner - ZPageAllocator::alloc_page So many thread seem to spent their time waiting in the method ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic tasks threads has also a few samples where it waits at ZPageAllocator::alloc_page_stall. I hope this will help you to find the issue. Thank you very much for reading this email until the end. I hope this is the good place for such a feedback. Let me know if I must report my problem elsewhere. Be free to ask me more questions if you need. Thank you all for this amazing tool !
Hi Fabrice, Thanks for reporting! Could you post the source code for the reproducer here? The 36 MB file could probably be replaced with a String::repeat expression. JFR does use some memory, which could impact available heap and performance, although the degradation you’re seeing seems awfully high. Thanks Erik ________________________________________ From: hotspot-jfr-dev <hotspot-jfr-dev-retn@openjdk.org> on behalf of Fabrice Bibonne <fabrice.bibonne@courriel.eco> Sent: Sunday, January 11, 2026 7:23 PM To: hotspot-jfr-dev@openjdk.org Subject: Using JFR both with ZGC degrades application throughput Hi all, I would like to report a case where starting jfr for an application running with zgc causes a significant throughput degradation (compared to when JFR is not started). My context : I was writing a little web app to illustrate a case where the use of ZGC gives a better throughput than with G1. I benchmarked with grafana k6 my application running with G1 and my application running with ZGC : the runs with ZGC gave better throughputs. I wanted to go a bit further in explanation so I began again my benchmarks with JFR to be able to illustrate GC gains in JMC. When I ran my web app with ZGC+JFR, I noticed a significant throughput degradation in my benchmark (which was not the case with G1+JFR). Although I did not measure an increase in overhead as such, I still wanted to report this issue because the degradation in throughput with JFR is such that it would not be usable as is on a production service. I wrote a little application (not a web one) to reproduce the problem : the application calls a little conversion service 200 times with random numbers in parallel (to be like a web app in charge and to pressure GC). The conversion service (a method named `convertNumberToWords`) convert the number in a String looking for the String in a Map with the number as th key. In order to instantiate and destroy many objects at each call, the map is built parsing a huge String at each call. Application ends after 200 calls. Here are the step to reproduce : 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact (be aware to be on branch jfr+zgc_impact) 2. Compile it (you must include numbers200k.zip in resources : it contains a 36 Mo text files whose contents are used to create the huge String variable) 3. in the root of repository : 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -classpath target/classes poc.java.perf.write.TestPerf #ZGC without JFR` 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -XX:StartFlightRecording -classpath target/classes poc.java.perf.write.TestPerf #ZGC with JFR` 4. The real time of the second run (with JFR) will be considerably higher than that of the first I ran these tests on my laptop : - Dell Inc. Latitude 5591 - openSUSE Tumbleweed 20260108 - Kernel : 6.18.3-1-default (64-bit) - 12 × Intel® Core™ i7-8850H CPU @ 2.60GHz - RAM 16 Gio - openjdk version "25.0.1" 2025-10-21 - OpenJDK Runtime Environment (build 25.0.1+8-27) - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) - many tabs opened in firefox ! I also ran it in a container (eclipse-temurin:25) on my laptop and with a windows laptop and came to the same conclusions : here are the measurements from the container : | Run with | Real time (s) | |-----------|---------------| | ZGC alone | 7.473 | | ZGC + jfr | 25.075 | | G1 alone | 10.195 | | G1 + jfr | 10.450 | After all these tests I tried to run the app with an other profiler tool in order to understand where is the issue. I join the flamegraph when running jfr+zgc : for the worker threads of the ForkJoinPool of Stream, stack traces of a majority of samples have the same top lines : - PosixSemaphore::wait - ZPageAllocator::alloc_page_stall - ZPageAllocator::alloc_page_inner - ZPageAllocator::alloc_page So many thread seem to spent their time waiting in the method ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic tasks threads has also a few samples where it waits at ZPageAllocator::alloc_page_stall. I hope this will help you to find the issue. Thank you very much for reading this email until the end. I hope this is the good place for such a feedback. Let me know if I must report my problem elsewhere. Be free to ask me more questions if you need. Thank you all for this amazing tool !
Here is a unique source code file for the reproducer (the big String is generated when starting as you suggested). It changes a little the results but the run with zgc + jfr is still taking lot of time. Thanks you for having a look. Fabrice Le 2026-01-12 10:56, Erik Gahlin a écrit :
Hi Fabrice,
Thanks for reporting!
Could you post the source code for the reproducer here? The 36 MB file could probably be replaced with a String::repeat expression.
JFR does use some memory, which could impact available heap and performance, although the degradation you're seeing seems awfully high.
Thanks Erik
________________________________________ From: hotspot-jfr-dev <hotspot-jfr-dev-retn@openjdk.org> on behalf of Fabrice Bibonne <fabrice.bibonne@courriel.eco> Sent: Sunday, January 11, 2026 7:23 PM To: hotspot-jfr-dev@openjdk.org Subject: Using JFR both with ZGC degrades application throughput
Hi all,
I would like to report a case where starting jfr for an application running with zgc causes a significant throughput degradation (compared to when JFR is not started).
My context : I was writing a little web app to illustrate a case where the use of ZGC gives a better throughput than with G1. I benchmarked with grafana k6 my application running with G1 and my application running with ZGC : the runs with ZGC gave better throughputs. I wanted to go a bit further in explanation so I began again my benchmarks with JFR to be able to illustrate GC gains in JMC. When I ran my web app with ZGC+JFR, I noticed a significant throughput degradation in my benchmark (which was not the case with G1+JFR).
Although I did not measure an increase in overhead as such, I still wanted to report this issue because the degradation in throughput with JFR is such that it would not be usable as is on a production service.
I wrote a little application (not a web one) to reproduce the problem : the application calls a little conversion service 200 times with random numbers in parallel (to be like a web app in charge and to pressure GC). The conversion service (a method named `convertNumberToWords`) convert the number in a String looking for the String in a Map with the number as th key. In order to instantiate and destroy many objects at each call, the map is built parsing a huge String at each call. Application ends after 200 calls.
Here are the step to reproduce : 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact (be aware to be on branch jfr+zgc_impact) 2. Compile it (you must include numbers200k.zip in resources : it contains a 36 Mo text files whose contents are used to create the huge String variable) 3. in the root of repository : 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -classpath target/classes poc.java.perf.write.TestPerf #ZGC without JFR` 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -XX:StartFlightRecording -classpath target/classes poc.java.perf.write.TestPerf #ZGC with JFR` 4. The real time of the second run (with JFR) will be considerably higher than that of the first
I ran these tests on my laptop : - Dell Inc. Latitude 5591 - openSUSE Tumbleweed 20260108 - Kernel : 6.18.3-1-default (64-bit) - 12 × Intel(R) Core(tm) i7-8850H CPU @ 2.60GHz - RAM 16 Gio - openjdk version "25.0.1" 2025-10-21 - OpenJDK Runtime Environment (build 25.0.1+8-27) - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) - many tabs opened in firefox !
I also ran it in a container (eclipse-temurin:25) on my laptop and with a windows laptop and came to the same conclusions : here are the measurements from the container :
| Run with | Real time (s) | |-----------|---------------| | ZGC alone | 7.473 | | ZGC + jfr | 25.075 | | G1 alone | 10.195 | | G1 + jfr | 10.450 |
After all these tests I tried to run the app with an other profiler tool in order to understand where is the issue. I join the flamegraph when running jfr+zgc : for the worker threads of the ForkJoinPool of Stream, stack traces of a majority of samples have the same top lines : - PosixSemaphore::wait - ZPageAllocator::alloc_page_stall - ZPageAllocator::alloc_page_inner - ZPageAllocator::alloc_page
So many thread seem to spent their time waiting in the method ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic tasks threads has also a few samples where it waits at ZPageAllocator::alloc_page_stall. I hope this will help you to find the issue.
Thank you very much for reading this email until the end. I hope this is the good place for such a feedback. Let me know if I must report my problem elsewhere. Be free to ask me more questions if you need.
Thank you all for this amazing tool !
Hi Fabrice, Thank you very much for reporting this and also for providing a great reproducer. We have made some progress towards understanding the problem space, at least. To help you continue with your demonstrations, explanations, and comparisons, I only need you to do the following: In the jdk/lib/jfr directory, there are two files that control the default and profile sets of JFR events: default.jfc and profile.jfc, respectively. <event name="jdk.OldObjectSample"> <setting name="enabled" control="old-objects-enabled">false</setting> <setting name="stackTrace" control="old-objects-stack-trace">false</setting> <setting name="cutoff" control="old-objects-cutoff">0 ns</setting> </event> Turn off the jdk.OldObjectSample event by setting enabled to false. This effectively turns off JFRs capability to monitor memory leaks in the background. With this small change, you should be back on track for proper comparisons, also when using JFR. Let me know if you have any questions. We will be thinking about how to solve this properly. Cheers for now Regards Markus Confidential- Oracle Internal From: hotspot-jfr-dev <hotspot-jfr-dev-retn@openjdk.org> On Behalf Of Fabrice Bibonne Sent: Monday, 12 January 2026 16:59 To: hotspot-jfr-dev@openjdk.org Subject: Re: Using JFR both with ZGC degrades application throughput Here is a unique source code file for the reproducer (the big String is generated when starting as you suggested). It changes a little the results but the run with zgc + jfr is still taking lot of time. Thanks you for having a look. Fabrice Le 2026-01-12 10:56, Erik Gahlin a écrit : Hi Fabrice, Thanks for reporting! Could you post the source code for the reproducer here? The 36 MB file could probably be replaced with a String::repeat expression. JFR does use some memory, which could impact available heap and performance, although the degradation you’re seeing seems awfully high. Thanks Erik ________________________________________ From: hotspot-jfr-dev <hotspot-jfr-dev-retn@openjdk.org<mailto:hotspot-jfr-dev-retn@openjdk.org>> on behalf of Fabrice Bibonne <fabrice.bibonne@courriel.eco<mailto:fabrice.bibonne@courriel.eco>> Sent: Sunday, January 11, 2026 7:23 PM To: hotspot-jfr-dev@openjdk.org<mailto:hotspot-jfr-dev@openjdk.org> Subject: Using JFR both with ZGC degrades application throughput Hi all, I would like to report a case where starting jfr for an application running with zgc causes a significant throughput degradation (compared to when JFR is not started). My context : I was writing a little web app to illustrate a case where the use of ZGC gives a better throughput than with G1. I benchmarked with grafana k6 my application running with G1 and my application running with ZGC : the runs with ZGC gave better throughputs. I wanted to go a bit further in explanation so I began again my benchmarks with JFR to be able to illustrate GC gains in JMC. When I ran my web app with ZGC+JFR, I noticed a significant throughput degradation in my benchmark (which was not the case with G1+JFR). Although I did not measure an increase in overhead as such, I still wanted to report this issue because the degradation in throughput with JFR is such that it would not be usable as is on a production service. I wrote a little application (not a web one) to reproduce the problem : the application calls a little conversion service 200 times with random numbers in parallel (to be like a web app in charge and to pressure GC). The conversion service (a method named `convertNumberToWords`) convert the number in a String looking for the String in a Map with the number as th key. In order to instantiate and destroy many objects at each call, the map is built parsing a huge String at each call. Application ends after 200 calls. Here are the step to reproduce : 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact (be aware to be on branch jfr+zgc_impact) 2. Compile it (you must include numbers200k.zip in resources : it contains a 36 Mo text files whose contents are used to create the huge String variable) 3. in the root of repository : 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -classpath target/classes poc.java.perf.write.TestPerf #ZGC without JFR` 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -XX:StartFlightRecording -classpath target/classes poc.java.perf.write.TestPerf #ZGC with JFR` 4. The real time of the second run (with JFR) will be considerably higher than that of the first I ran these tests on my laptop : - Dell Inc. Latitude 5591 - openSUSE Tumbleweed 20260108 - Kernel : 6.18.3-1-default (64-bit) - 12 × Intel® Core™ i7-8850H CPU @ 2.60GHz - RAM 16 Gio - openjdk version "25.0.1" 2025-10-21 - OpenJDK Runtime Environment (build 25.0.1+8-27) - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) - many tabs opened in firefox ! I also ran it in a container (eclipse-temurin:25) on my laptop and with a windows laptop and came to the same conclusions : here are the measurements from the container : | Run with | Real time (s) | |-----------|---------------| | ZGC alone | 7.473 | | ZGC + jfr | 25.075 | | G1 alone | 10.195 | | G1 + jfr | 10.450 | After all these tests I tried to run the app with an other profiler tool in order to understand where is the issue. I join the flamegraph when running jfr+zgc : for the worker threads of the ForkJoinPool of Stream, stack traces of a majority of samples have the same top lines : - PosixSemaphore::wait - ZPageAllocator::alloc_page_stall - ZPageAllocator::alloc_page_inner - ZPageAllocator::alloc_page So many thread seem to spent their time waiting in the method ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic tasks threads has also a few samples where it waits at ZPageAllocator::alloc_page_stall. I hope this will help you to find the issue. Thank you very much for reading this email until the end. I hope this is the good place for such a feedback. Let me know if I must report my problem elsewhere. Be free to ask me more questions if you need. Thank you all for this amazing tool !
Hi again, I just remembered we have improved our ergonomics over the years. Therefore, there is a much easier way for you to do this without configuring anything in the .jfc files: you can simply override event settings on the command line. [1] -XX:StartFlightRecording:jdk.OldObjectSample#enabled=false Way easier! Cheers Markus [1] https://egahlin.github.io/2022/05/31/improved-ergonomics.html Confidential- Oracle Internal From: hotspot-jfr-dev <hotspot-jfr-dev-retn@openjdk.org> On Behalf Of Markus Gronlund Sent: Wednesday, 14 January 2026 19:15 To: Fabrice Bibonne <fabrice.bibonne@courriel.eco> Cc: hotspot-jfr-dev@openjdk.org Subject: RE: Using JFR both with ZGC degrades application throughput Hi Fabrice, Thank you very much for reporting this and also for providing a great reproducer. We have made some progress towards understanding the problem space, at least. To help you continue with your demonstrations, explanations, and comparisons, I only need you to do the following: In the jdk/lib/jfr directory, there are two files that control the default and profile sets of JFR events: default.jfc and profile.jfc, respectively. <event name="jdk.OldObjectSample"> <setting name="enabled" control="old-objects-enabled">false</setting> <setting name="stackTrace" control="old-objects-stack-trace">false</setting> <setting name="cutoff" control="old-objects-cutoff">0 ns</setting> </event> Turn off the jdk.OldObjectSample event by setting enabled to false. This effectively turns off JFRs capability to monitor memory leaks in the background. With this small change, you should be back on track for proper comparisons, also when using JFR. Let me know if you have any questions. We will be thinking about how to solve this properly. Cheers for now Regards Markus Confidential- Oracle Internal From: hotspot-jfr-dev <hotspot-jfr-dev-retn@openjdk.org<mailto:hotspot-jfr-dev-retn@openjdk.org>> On Behalf Of Fabrice Bibonne Sent: Monday, 12 January 2026 16:59 To: hotspot-jfr-dev@openjdk.org<mailto:hotspot-jfr-dev@openjdk.org> Subject: Re: Using JFR both with ZGC degrades application throughput Here is a unique source code file for the reproducer (the big String is generated when starting as you suggested). It changes a little the results but the run with zgc + jfr is still taking lot of time. Thanks you for having a look. Fabrice Le 2026-01-12 10:56, Erik Gahlin a écrit : Hi Fabrice, Thanks for reporting! Could you post the source code for the reproducer here? The 36 MB file could probably be replaced with a String::repeat expression. JFR does use some memory, which could impact available heap and performance, although the degradation you’re seeing seems awfully high. Thanks Erik ________________________________________ From: hotspot-jfr-dev <hotspot-jfr-dev-retn@openjdk.org<mailto:hotspot-jfr-dev-retn@openjdk.org>> on behalf of Fabrice Bibonne <fabrice.bibonne@courriel.eco<mailto:fabrice.bibonne@courriel.eco>> Sent: Sunday, January 11, 2026 7:23 PM To: hotspot-jfr-dev@openjdk.org<mailto:hotspot-jfr-dev@openjdk.org> Subject: Using JFR both with ZGC degrades application throughput Hi all, I would like to report a case where starting jfr for an application running with zgc causes a significant throughput degradation (compared to when JFR is not started). My context : I was writing a little web app to illustrate a case where the use of ZGC gives a better throughput than with G1. I benchmarked with grafana k6 my application running with G1 and my application running with ZGC : the runs with ZGC gave better throughputs. I wanted to go a bit further in explanation so I began again my benchmarks with JFR to be able to illustrate GC gains in JMC. When I ran my web app with ZGC+JFR, I noticed a significant throughput degradation in my benchmark (which was not the case with G1+JFR). Although I did not measure an increase in overhead as such, I still wanted to report this issue because the degradation in throughput with JFR is such that it would not be usable as is on a production service. I wrote a little application (not a web one) to reproduce the problem : the application calls a little conversion service 200 times with random numbers in parallel (to be like a web app in charge and to pressure GC). The conversion service (a method named `convertNumberToWords`) convert the number in a String looking for the String in a Map with the number as th key. In order to instantiate and destroy many objects at each call, the map is built parsing a huge String at each call. Application ends after 200 calls. Here are the step to reproduce : 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact (be aware to be on branch jfr+zgc_impact) 2. Compile it (you must include numbers200k.zip in resources : it contains a 36 Mo text files whose contents are used to create the huge String variable) 3. in the root of repository : 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -classpath target/classes poc.java.perf.write.TestPerf #ZGC without JFR` 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -XX:StartFlightRecording -classpath target/classes poc.java.perf.write.TestPerf #ZGC with JFR` 4. The real time of the second run (with JFR) will be considerably higher than that of the first I ran these tests on my laptop : - Dell Inc. Latitude 5591 - openSUSE Tumbleweed 20260108 - Kernel : 6.18.3-1-default (64-bit) - 12 × Intel® Core™ i7-8850H CPU @ 2.60GHz - RAM 16 Gio - openjdk version "25.0.1" 2025-10-21 - OpenJDK Runtime Environment (build 25.0.1+8-27) - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) - many tabs opened in firefox ! I also ran it in a container (eclipse-temurin:25) on my laptop and with a windows laptop and came to the same conclusions : here are the measurements from the container : | Run with | Real time (s) | |-----------|---------------| | ZGC alone | 7.473 | | ZGC + jfr | 25.075 | | G1 alone | 10.195 | | G1 + jfr | 10.450 | After all these tests I tried to run the app with an other profiler tool in order to understand where is the issue. I join the flamegraph when running jfr+zgc : for the worker threads of the ForkJoinPool of Stream, stack traces of a majority of samples have the same top lines : - PosixSemaphore::wait - ZPageAllocator::alloc_page_stall - ZPageAllocator::alloc_page_inner - ZPageAllocator::alloc_page So many thread seem to spent their time waiting in the method ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic tasks threads has also a few samples where it waits at ZPageAllocator::alloc_page_stall. I hope this will help you to find the issue. Thank you very much for reading this email until the end. I hope this is the good place for such a feedback. Let me know if I must report my problem elsewhere. Be free to ask me more questions if you need. Thank you all for this amazing tool !
Hi, Yes turning off jdk.OldObjectSample event solved the issue : the real time execution of my sample with zgc and JFR recording with jdk.OldObjectSample turned off is now very close to that without JFR recording. Thank you very much. Best regards. Fabrice Le 2026-01-14 19:30, Markus Gronlund a écrit :
Hi again,
I just remembered we have improved our ergonomics over the years.
Therefore, there is a much easier way for you to do this without configuring anything in the .jfc files: you can simply override event settings on the command line. [1]
-XX:StartFlightRecording:jdk.OldObjectSample#enabled=false
Way easier!
Cheers
Markus
[1] https://egahlin.github.io/2022/05/31/improved-ergonomics.html [2]
Confidential- Oracle Internal
From: hotspot-jfr-dev <hotspot-jfr-dev-retn@openjdk.org> On Behalf Of Markus Gronlund Sent: Wednesday, 14 January 2026 19:15 To: Fabrice Bibonne <fabrice.bibonne@courriel.eco> Cc: hotspot-jfr-dev@openjdk.org Subject: RE: Using JFR both with ZGC degrades application throughput
Hi Fabrice,
Thank you very much for reporting this and also for providing a great reproducer.
We have made some progress towards understanding the problem space, at least.
To help you continue with your demonstrations, explanations, and comparisons, I only need you to do the following:
In the jdk/lib/jfr directory, there are two files that control the default and profile sets of JFR events: default.jfc and profile.jfc, respectively.
<event name="jdk.OldObjectSample">
<setting name="enabled" control="old-objects-enabled">false</setting>
<setting name="stackTrace" control="old-objects-stack-trace">false</setting>
<setting name="cutoff" control="old-objects-cutoff">0 ns</setting>
</event>
Turn off the jdk.OldObjectSample event by setting enabled to false.
This effectively turns off JFRs capability to monitor memory leaks in the background.
With this small change, you should be back on track for proper comparisons, also when using JFR.
Let me know if you have any questions. We will be thinking about how to solve this properly.
Cheers for now
Regards
Markus
Confidential- Oracle Internal
From: hotspot-jfr-dev <hotspot-jfr-dev-retn@openjdk.org> On Behalf Of Fabrice Bibonne Sent: Monday, 12 January 2026 16:59 To: hotspot-jfr-dev@openjdk.org Subject: Re: Using JFR both with ZGC degrades application throughput
Here is a unique source code file for the reproducer (the big String is generated when starting as you suggested). It changes a little the results but the run with zgc + jfr is still taking lot of time.
Thanks you for having a look.
Fabrice
Le 2026-01-12 10:56, Erik Gahlin a écrit :
Hi Fabrice,
Thanks for reporting!
Could you post the source code for the reproducer here? The 36 MB file could probably be replaced with a String::repeat expression.
JFR does use some memory, which could impact available heap and performance, although the degradation you're seeing seems awfully high.
Thanks Erik
________________________________________ From: hotspot-jfr-dev <hotspot-jfr-dev-retn@openjdk.org> on behalf of Fabrice Bibonne <fabrice.bibonne@courriel.eco> Sent: Sunday, January 11, 2026 7:23 PM To: hotspot-jfr-dev@openjdk.org Subject: Using JFR both with ZGC degrades application throughput
Hi all,
I would like to report a case where starting jfr for an application running with zgc causes a significant throughput degradation (compared to when JFR is not started).
My context : I was writing a little web app to illustrate a case where the use of ZGC gives a better throughput than with G1. I benchmarked with grafana k6 my application running with G1 and my application running with ZGC : the runs with ZGC gave better throughputs. I wanted to go a bit further in explanation so I began again my benchmarks with JFR to be able to illustrate GC gains in JMC. When I ran my web app with ZGC+JFR, I noticed a significant throughput degradation in my benchmark (which was not the case with G1+JFR).
Although I did not measure an increase in overhead as such, I still wanted to report this issue because the degradation in throughput with JFR is such that it would not be usable as is on a production service.
I wrote a little application (not a web one) to reproduce the problem : the application calls a little conversion service 200 times with random numbers in parallel (to be like a web app in charge and to pressure GC). The conversion service (a method named `convertNumberToWords`) convert the number in a String looking for the String in a Map with the number as th key. In order to instantiate and destroy many objects at each call, the map is built parsing a huge String at each call. Application ends after 200 calls.
Here are the step to reproduce : 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact [1] (be aware to be on branch jfr+zgc_impact) 2. Compile it (you must include numbers200k.zip in resources : it contains a 36 Mo text files whose contents are used to create the huge String variable) 3. in the root of repository : 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -classpath target/classes poc.java.perf.write.TestPerf #ZGC without JFR` 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -XX:StartFlightRecording -classpath target/classes poc.java.perf.write.TestPerf #ZGC with JFR` 4. The real time of the second run (with JFR) will be considerably higher than that of the first
I ran these tests on my laptop : - Dell Inc. Latitude 5591 - openSUSE Tumbleweed 20260108 - Kernel : 6.18.3-1-default (64-bit) - 12 × Intel(R) Core(tm) i7-8850H CPU @ 2.60GHz - RAM 16 Gio - openjdk version "25.0.1" 2025-10-21 - OpenJDK Runtime Environment (build 25.0.1+8-27) - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) - many tabs opened in firefox !
I also ran it in a container (eclipse-temurin:25) on my laptop and with a windows laptop and came to the same conclusions : here are the measurements from the container :
| Run with | Real time (s) | |-----------|---------------| | ZGC alone | 7.473 | | ZGC + jfr | 25.075 | | G1 alone | 10.195 | | G1 + jfr | 10.450 |
After all these tests I tried to run the app with an other profiler tool in order to understand where is the issue. I join the flamegraph when running jfr+zgc : for the worker threads of the ForkJoinPool of Stream, stack traces of a majority of samples have the same top lines : - PosixSemaphore::wait - ZPageAllocator::alloc_page_stall - ZPageAllocator::alloc_page_inner - ZPageAllocator::alloc_page
So many thread seem to spent their time waiting in the method ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic tasks threads has also a few samples where it waits at ZPageAllocator::alloc_page_stall. I hope this will help you to find the issue.
Thank you very much for reading this email until the end. I hope this is the good place for such a feedback. Let me know if I must report my problem elsewhere. Be free to ask me more questions if you need.
Thank you all for this amazing tool !
Links: ------ [1] https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact [2] https://egahlin.github.io/2022/05/31/improved-ergonomics.html
Hi, while not being able to answer the question about why using JFR takes so much additional time, when reading about your benchmark setup the following things came to my mind: * -XX:+UseCompressedOops for ZGC does nothing (ZGC does not support compressed oops at all), and G1 will automatically use it. You can leave it off. * G1 having a significantly worse throughput than ZGC is very rare: even then the extent you show is quite large. Taking some of content together (4g heap, Maps, huge string variables) indicates that you might have run into a well-known pathology of G1 with large objects: the application might waste up to 50% of your application due to these humongous objects [0]. G1 might work better in JDK 26 too as some enhancement to some particular case has been added. More is being worked on. TL;DR: Your application might run much better with a large(r) G1HeapRegionSize setting. Or just upgrading to JDK 26. * While ZGC does not have that in some cases extreme memory wastage for large allocations, there is still some. Adding JFR might just push it over the edge (the stack you showed are about finding a new empty page/region for allocation, failing to do so, doing a GC, stalling and waiting). Hth, Thomas [0] https://tschatzl.github.io/2021/11/15/heap-regions-x-large.html On 11.01.26 19:23, Fabrice Bibonne wrote:
Hi all,
I would like to report a case where starting jfr for an application running with zgc causes a significant throughput degradation (compared to when JFR is not started).
My context : I was writing a little web app to illustrate a case where the use of ZGC gives a better throughput than with G1. I benchmarked with grafana k6 my application running with G1 and my application running with ZGC : the runs with ZGC gave better throughputs. I wanted to go a bit further in explanation so I began again my benchmarks with JFR to be able to illustrate GC gains in JMC. When I ran my web app with ZGC+JFR, I noticed a significant throughput degradation in my benchmark (which was not the case with G1+JFR).
Although I did not measure an increase in overhead as such, I still wanted to report this issue because the degradation in throughput with JFR is such that it would not be usable as is on a production service.
I wrote a little application (not a web one) to reproduce the problem : the application calls a little conversion service 200 times with random numbers in parallel (to be like a web app in charge and to pressure GC). The conversion service (a method named `convertNumberToWords`) convert the number in a String looking for the String in a Map with the number as th key. In order to instantiate and destroy many objects at each call, the map is built parsing a huge String at each call. Application ends after 200 calls.
Here are the step to reproduce : 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact (be aware to be on branch jfr+zgc_impact) 2. Compile it (you must include numbers200k.zip in resources : it contains a 36 Mo text files whose contents are used to create the huge String variable) 3. in the root of repository : 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -classpath target/classes poc.java.perf.write.TestPerf #ZGC without JFR` 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops - XX:StartFlightRecording -classpath target/classes poc.java.perf.write.TestPerf #ZGC with JFR` 4. The real time of the second run (with JFR) will be considerably higher than that of the first
I ran these tests on my laptop : - Dell Inc. Latitude 5591 - openSUSE Tumbleweed 20260108 - Kernel : 6.18.3-1-default (64-bit) - 12 × Intel® Core™ i7-8850H CPU @ 2.60GHz - RAM 16 Gio - openjdk version "25.0.1" 2025-10-21 - OpenJDK Runtime Environment (build 25.0.1+8-27) - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) - many tabs opened in firefox !
I also ran it in a container (eclipse-temurin:25) on my laptop and with a windows laptop and came to the same conclusions : here are the measurements from the container :
| Run with | Real time (s) | |-----------|---------------| | ZGC alone | 7.473 | | ZGC + jfr | 25.075 | | G1 alone | 10.195 | | G1 + jfr | 10.450 |
After all these tests I tried to run the app with an other profiler tool in order to understand where is the issue. I join the flamegraph when running jfr+zgc : for the worker threads of the ForkJoinPool of Stream, stack traces of a majority of samples have the same top lines : - PosixSemaphore::wait - ZPageAllocator::alloc_page_stall - ZPageAllocator::alloc_page_inner - ZPageAllocator::alloc_page
So many thread seem to spent their time waiting in the method ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic tasks threads has also a few samples where it waits at ZPageAllocator::alloc_page_stall. I hope this will help you to find the issue.
Thank you very much for reading this email until the end. I hope this is the good place for such a feedback. Let me know if I must report my problem elsewhere. Be free to ask me more questions if you need.
Thank you all for this amazing tool !
Thank you for your advise, I just give a few precisions in a few lines : * for `-XX:+UseCompressedOops`, I must admit I do not know this option : I add it because JDK Mission control warned me about it in "Automated analysis result" after a fisrt try (<<Compressed Oops is turned off [...].Use the JVM argument '-XX:+UseCompressedOops' to enable this feature>>) * it is true that application waste time in GC pauses (46,6% of time with G1) : I wanted an example app which uses GC a lot. Maybe this is a little too much compared to real apps (even if for some of them, we may wonder...). * the stack I showed about finding a new empty page/region allocation is present in both cases (with jfr and without jfr). But in the case with jfr, it is much more wider : it takes much more samples. Best regards, Fabrice Le 2026-01-12 14:18, Thomas Schatzl a écrit :
Hi,
while not being able to answer the question about why using JFR takes so much additional time, when reading about your benchmark setup the following things came to my mind:
* -XX:+UseCompressedOops for ZGC does nothing (ZGC does not support compressed oops at all), and G1 will automatically use it. You can leave it off.
* G1 having a significantly worse throughput than ZGC is very rare: even then the extent you show is quite large. Taking some of content together (4g heap, Maps, huge string variables) indicates that you might have run into a well-known pathology of G1 with large objects: the application might waste up to 50% of your application due to these humongous objects [0 [1]]. G1 might work better in JDK 26 too as some enhancement to some particular case has been added. More is being worked on.
TL;DR: Your application might run much better with a large(r) G1HeapRegionSize setting. Or just upgrading to JDK 26.
* While ZGC does not have that in some cases extreme memory wastage for large allocations, there is still some. Adding JFR might just push it over the edge (the stack you showed are about finding a new empty page/region for allocation, failing to do so, doing a GC, stalling and waiting).
Hth, Thomas
[0] https://tschatzl.github.io/2021/11/15/heap-regions-x-large.html
On 11.01.26 19:23, Fabrice Bibonne wrote:
Hi all,
I would like to report a case where starting jfr for an application running with zgc causes a significant throughput degradation (compared to when JFR is not started).
My context : I was writing a little web app to illustrate a case where the use of ZGC gives a better throughput than with G1. I benchmarked with grafana k6 my application running with G1 and my application running with ZGC : the runs with ZGC gave better throughputs. I wanted to go a bit further in explanation so I began again my benchmarks with JFR to be able to illustrate GC gains in JMC. When I ran my web app with ZGC+JFR, I noticed a significant throughput degradation in my benchmark (which was not the case with G1+JFR).
Although I did not measure an increase in overhead as such, I still wanted to report this issue because the degradation in throughput with JFR is such that it would not be usable as is on a production service.
I wrote a little application (not a web one) to reproduce the problem : the application calls a little conversion service 200 times with random numbers in parallel (to be like a web app in charge and to pressure GC). The conversion service (a method named `convertNumberToWords`) convert the number in a String looking for the String in a Map with the number as th key. In order to instantiate and destroy many objects at each call, the map is built parsing a huge String at each call. Application ends after 200 calls.
Here are the step to reproduce : 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact (be aware to be on branch jfr+zgc_impact) 2. Compile it (you must include numbers200k.zip in resources : it contains a 36 Mo text files whose contents are used to create the huge String variable) 3. in the root of repository : 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -classpath target/classes poc.java.perf.write.TestPerf #ZGC without JFR` 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops - XX:StartFlightRecording -classpath target/classes poc.java.perf.write.TestPerf #ZGC with JFR` 4. The real time of the second run (with JFR) will be considerably higher than that of the first
I ran these tests on my laptop : - Dell Inc. Latitude 5591 - openSUSE Tumbleweed 20260108 - Kernel : 6.18.3-1-default (64-bit) - 12 × Intel(R) Core(tm) i7-8850H CPU @ 2.60GHz - RAM 16 Gio - openjdk version "25.0.1" 2025-10-21 - OpenJDK Runtime Environment (build 25.0.1+8-27) - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) - many tabs opened in firefox !
I also ran it in a container (eclipse-temurin:25) on my laptop and with a windows laptop and came to the same conclusions : here are the measurements from the container :
| Run with | Real time (s) | |-----------|---------------| | ZGC alone | 7.473 | | ZGC + jfr | 25.075 | | G1 alone | 10.195 | | G1 + jfr | 10.450 |
After all these tests I tried to run the app with an other profiler tool in order to understand where is the issue. I join the flamegraph when running jfr+zgc : for the worker threads of the ForkJoinPool of Stream, stack traces of a majority of samples have the same top lines : - PosixSemaphore::wait - ZPageAllocator::alloc_page_stall - ZPageAllocator::alloc_page_inner - ZPageAllocator::alloc_page
So many thread seem to spent their time waiting in the method ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic tasks threads has also a few samples where it waits at ZPageAllocator::alloc_page_stall. I hope this will help you to find the issue.
Thank you very much for reading this email until the end. I hope this is the good place for such a feedback. Let me know if I must report my problem elsewhere. Be free to ask me more questions if you need.
Thank you all for this amazing tool !
Links: ------ [1] https://tschatzl.github.io/2021/11/15/heap-regions-x-large.html
Hi, On 13.01.26 05:36, Fabrice Bibonne wrote:
Thank you for your advise, I just give a few precisions in a few lines :
* for `-XX:+UseCompressedOops`, I must admit I do not know this option : I add it because JDK Mission control warned me about it in "Automated analysis result" after a fisrt try (<<Compressed Oops is turned off [...].Use the JVM argument '-XX:+UseCompressedOops' to enable this feature>>)
Maybe JMC should not provide this hint for ZGC then (not directed towards you).
* it is true that application waste time in GC pauses (46,6% of time with G1) : I wanted an example app which uses GC a lot. Maybe this is a little too much compared to real apps (even if for some of them, we may wonder...).
What I am saying is that while the results are as they are for you, I suspect that the result is not representative for G1 as it exercises a pathology that could (and unfortunately must if it is really the case) be resolved by a single command line switch by the user. The G1 GC algorithm would need prior knowledge of the application it is running to automatically resolve this. Having had a look at G1 behavior, the reason for the low performance is likely due to heap sizing heuristics issues, G1 does not expand the heap as aggressively as ZGC. The upside obviously is that it uses (much) less memory. ;) More technical explanation: * in presence of full collection ([0], in the process of being fixed right now), G1 does not expand the heap, running with maybe half of what ZGC uses. This is due to the behavior of the application. * even if fixing the bug via some command line options (-XX:MaxHeapFreeRatio=100), the short runtime of the application (i.e. even with that fix applied, it takes to long to get to the same heap size as ZGC for reasons we can discuss if you want. * the mentioned issue with the large objects, i.e. G1 wasting too much memory also contributes. Interestingly, I only observed this on slower systems, these issues do not show on faster ones, e.g. on some x64 workstation (limited to 10 threads). On that workstation, G1 is 2x faster than ZGC with the settings you gave already. However on some smallish Aarch64 VM it is around the same performance (slightly slower). This is probably what you are seeing on your laptop (which may also experience aggressive throttling without precautions). TL;DR: If you set minimum heap size, and region size, G1 is 2x faster than ZGC (with -Xms4g -Xmx4g -XX:G1HeapRegionSize=8m) on that slower aarch64 machine too here. (Fwiw, for maximum throughput we recommend to set minimum and maximum heap size to the same value irrespective of the garbage collector, see the recommendations in our performance guide [1]. It also describes the issue with humongous objects. We are working on improving both issues right now). Another observation is that with ZGC, although overall throughput is faster than with G1 in your original example, its allocation stalls are in the range of hundreds of milliseconds, while G1 pauses are at most at 50ms. So the "experience" with that original web app may be better with G1 even if it is slower overall :P (We do not recommend running latency oriented programs at that cpu load level either way, but just noticing).
* the stack I showed about finding a new empty page/region allocation is present in both cases (with jfr and without jfr). But in the case with jfr, it is much more wider : it takes much more samples.
Problems tend to exacerbate themselves, i.e. after a certain threshold of allocation rate beyond what it can sustain, performance can quickly (non-linearly) detoriate, e.g. because of the need to use of different slower algorithms. Without JFR I am already seeing that almost all GCs are caused by allocation stalls. Adding to that will not help. When looking around in ZGC logs a bit, with StartFlightRecording there seems to be much more so-called in-place object movement (i.e. instead of copying live objects to a new place, and then freeing the old now empty space, the objects are moved "down" the heap to fill gaps), which is a lot more expensive. This shows in garbage collection pauses, changing from hundreds of ms to seconds. As mentioned above, it looks like just that little extra memory usage causes ZGC to go into some very slow mode to free memory and avoid OOME. Hth, Thomas [0] https://bugs.openjdk.org/browse/JDK-8238686 [1] https://docs.oracle.com/en/java/javase/25/gctuning/garbage-first-garbage-col...
Best regards,
Fabrice
Le 2026-01-12 14:18, Thomas Schatzl a écrit :
Hi,
while not being able to answer the question about why using JFR takes so much additional time, when reading about your benchmark setup the following things came to my mind:
* -XX:+UseCompressedOops for ZGC does nothing (ZGC does not support compressed oops at all), and G1 will automatically use it. You can leave it off.
* G1 having a significantly worse throughput than ZGC is very rare: even then the extent you show is quite large. Taking some of content together (4g heap, Maps, huge string variables) indicates that you might have run into a well-known pathology of G1 with large objects: the application might waste up to 50% of your application due to these humongous objects [0 <https://tschatzl.github.io/2021/11/15/heap- regions-x-large.html>]. G1 might work better in JDK 26 too as some enhancement to some particular case has been added. More is being worked on.
TL;DR: Your application might run much better with a large(r) G1HeapRegionSize setting. Or just upgrading to JDK 26.
* While ZGC does not have that in some cases extreme memory wastage for large allocations, there is still some. Adding JFR might just push it over the edge (the stack you showed are about finding a new empty page/region for allocation, failing to do so, doing a GC, stalling and waiting).
Hth, Thomas
[0] https://tschatzl.github.io/2021/11/15/heap-regions-x-large.html <https://tschatzl.github.io/2021/11/15/heap-regions-x-large.html>
On 11.01.26 19:23, Fabrice Bibonne wrote:
Hi all,
I would like to report a case where starting jfr for an application running with zgc causes a significant throughput degradation (compared to when JFR is not started).
My context : I was writing a little web app to illustrate a case where the use of ZGC gives a better throughput than with G1. I benchmarked with grafana k6 my application running with G1 and my application running with ZGC : the runs with ZGC gave better throughputs. I wanted to go a bit further in explanation so I began again my benchmarks with JFR to be able to illustrate GC gains in JMC. When I ran my web app with ZGC+JFR, I noticed a significant throughput degradation in my benchmark (which was not the case with G1+JFR).
Although I did not measure an increase in overhead as such, I still wanted to report this issue because the degradation in throughput with JFR is such that it would not be usable as is on a production service.
I wrote a little application (not a web one) to reproduce the problem : the application calls a little conversion service 200 times with random numbers in parallel (to be like a web app in charge and to pressure GC). The conversion service (a method named `convertNumberToWords`) convert the number in a String looking for the String in a Map with the number as th key. In order to instantiate and destroy many objects at each call, the map is built parsing a huge String at each call. Application ends after 200 calls.
Here are the step to reproduce : 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact <https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact> (be aware to be on branch jfr+zgc_impact) 2. Compile it (you must include numbers200k.zip in resources : it contains a 36 Mo text files whose contents are used to create the huge String variable) 3. in the root of repository : 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops - classpath target/classes poc.java.perf.write.TestPerf #ZGC without JFR` 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops - XX:StartFlightRecording -classpath target/classes poc.java.perf.write.TestPerf #ZGC with JFR` 4. The real time of the second run (with JFR) will be considerably higher than that of the first
I ran these tests on my laptop : - Dell Inc. Latitude 5591 - openSUSE Tumbleweed 20260108 - Kernel : 6.18.3-1-default (64-bit) - 12 × Intel® Core™ i7-8850H CPU @ 2.60GHz - RAM 16 Gio - openjdk version "25.0.1" 2025-10-21 - OpenJDK Runtime Environment (build 25.0.1+8-27) - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) - many tabs opened in firefox !
I also ran it in a container (eclipse-temurin:25) on my laptop and with a windows laptop and came to the same conclusions : here are the measurements from the container :
| Run with | Real time (s) | |-----------|---------------| | ZGC alone | 7.473 | | ZGC + jfr | 25.075 | | G1 alone | 10.195 | | G1 + jfr | 10.450 |
After all these tests I tried to run the app with an other profiler tool in order to understand where is the issue. I join the flamegraph when running jfr+zgc : for the worker threads of the ForkJoinPool of Stream, stack traces of a majority of samples have the same top lines : - PosixSemaphore::wait - ZPageAllocator::alloc_page_stall - ZPageAllocator::alloc_page_inner - ZPageAllocator::alloc_page
So many thread seem to spent their time waiting in the method ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic tasks threads has also a few samples where it waits at ZPageAllocator::alloc_page_stall. I hope this will help you to find the issue.
Thank you very much for reading this email until the end. I hope this is the good place for such a feedback. Let me know if I must report my problem elsewhere. Be free to ask me more questions if you need.
Thank you all for this amazing tool !
participants (4)
-
Erik Gahlin
-
Fabrice Bibonne
-
Markus Gronlund
-
Thomas Schatzl