Using JFR both with ZGC degrades application throughput
Thomas Schatzl
thomas.schatzl at oracle.com
Mon Jan 12 13:18:47 UTC 2026
Hi,
while not being able to answer the question about why using JFR takes
so much additional time, when reading about your benchmark setup the
following things came to my mind:
* -XX:+UseCompressedOops for ZGC does nothing (ZGC does not support
compressed oops at all), and G1 will automatically use it. You can leave
it off.
* G1 having a significantly worse throughput than ZGC is very rare: even
then the extent you show is quite large. Taking some of content together
(4g heap, Maps, huge string variables) indicates that you might have run
into a well-known pathology of G1 with large objects: the application
might waste up to 50% of your application due to these humongous objects
[0].
G1 might work better in JDK 26 too as some enhancement to some
particular case has been added. More is being worked on.
TL;DR: Your application might run much better with a large(r)
G1HeapRegionSize setting. Or just upgrading to JDK 26.
* While ZGC does not have that in some cases extreme memory wastage for
large allocations, there is still some. Adding JFR might just push it
over the edge (the stack you showed are about finding a new empty
page/region for allocation, failing to do so, doing a GC, stalling and
waiting).
Hth,
Thomas
[0] https://tschatzl.github.io/2021/11/15/heap-regions-x-large.html
On 11.01.26 19:23, Fabrice Bibonne wrote:
> Hi all,
>
> I would like to report a case where starting jfr for an application
> running with zgc causes a significant throughput degradation (compared
> to when JFR is not started).
>
> My context : I was writing a little web app to illustrate a case where
> the use of ZGC gives a better throughput than with G1. I benchmarked
> with grafana k6 my application running with G1 and my application
> running with ZGC : the runs with ZGC gave better throughputs. I wanted
> to go a bit further in explanation so I began again my benchmarks with
> JFR to be able to illustrate GC gains in JMC. When I ran my web app with
> ZGC+JFR, I noticed a significant throughput degradation in my benchmark
> (which was not the case with G1+JFR).
>
> Although I did not measure an increase in overhead as such, I still
> wanted to report this issue because the degradation in throughput with
> JFR is such that it would not be usable as is on a production service.
>
> I wrote a little application (not a web one) to reproduce the problem :
> the application calls a little conversion service 200 times with random
> numbers in parallel (to be like a web app in charge and to pressure GC).
> The conversion service (a method named `convertNumberToWords`) convert
> the number in a String looking for the String in a Map with the number
> as th key. In order to instantiate and destroy many objects at each
> call, the map is built parsing a huge String at each call. Application
> ends after 200 calls.
>
> Here are the step to reproduce :
> 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact
> (be aware to be on branch jfr+zgc_impact)
> 2. Compile it (you must include numbers200k.zip in resources : it
> contains a 36 Mo text files whose contents are used to create the huge
> String variable)
> 3. in the root of repository :
> 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -classpath
> target/classes poc.java.perf.write.TestPerf #ZGC without JFR`
> 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -
> XX:StartFlightRecording -classpath target/classes
> poc.java.perf.write.TestPerf #ZGC with JFR`
> 4. The real time of the second run (with JFR) will be considerably
> higher than that of the first
>
> I ran these tests on my laptop :
> - Dell Inc. Latitude 5591
> - openSUSE Tumbleweed 20260108
> - Kernel : 6.18.3-1-default (64-bit)
> - 12 × Intel® Core™ i7-8850H CPU @ 2.60GHz
> - RAM 16 Gio
> - openjdk version "25.0.1" 2025-10-21
> - OpenJDK Runtime Environment (build 25.0.1+8-27)
> - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing)
> - many tabs opened in firefox !
>
> I also ran it in a container (eclipse-temurin:25) on my laptop and with
> a windows laptop and came to the same conclusions : here are the
> measurements from the container :
>
> | Run with | Real time (s) |
> |-----------|---------------|
> | ZGC alone | 7.473 |
> | ZGC + jfr | 25.075 |
> | G1 alone | 10.195 |
> | G1 + jfr | 10.450 |
>
>
> After all these tests I tried to run the app with an other profiler tool
> in order to understand where is the issue. I join the flamegraph when
> running jfr+zgc : for the worker threads of the ForkJoinPool of Stream,
> stack traces of a majority of samples have the same top lines :
> - PosixSemaphore::wait
> - ZPageAllocator::alloc_page_stall
> - ZPageAllocator::alloc_page_inner
> - ZPageAllocator::alloc_page
>
> So many thread seem to spent their time waiting in the method
> ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic
> tasks threads has also a few samples where it waits at
> ZPageAllocator::alloc_page_stall. I hope this will help you to find the
> issue.
>
> Thank you very much for reading this email until the end. I hope this is
> the good place for such a feedback. Let me know if I must report my
> problem elsewhere. Be free to ask me more questions if you need.
>
> Thank you all for this amazing tool !
>
>
More information about the hotspot-jfr-dev
mailing list