Using JFR both with ZGC degrades application throughput
Thomas Schatzl
thomas.schatzl at oracle.com
Tue Jan 13 10:06:07 UTC 2026
Hi,
On 13.01.26 05:36, Fabrice Bibonne wrote:
> Thank you for your advise, I just give a few precisions in a few lines :
>
> * for `-XX:+UseCompressedOops`, I must admit I do not know this option :
> I add it because JDK Mission control warned me about it in "Automated
> analysis result" after a fisrt try (<<Compressed Oops is turned off
> [...].Use the JVM argument '-XX:+UseCompressedOops' to enable this
> feature>>)
Maybe JMC should not provide this hint for ZGC then (not directed
towards you).
>
> * it is true that application waste time in GC pauses (46,6% of time
> with G1) : I wanted an example app which uses GC a lot. Maybe this is a
> little too much compared to real apps (even if for some of them, we may
> wonder...).
What I am saying is that while the results are as they are for you, I
suspect that the result is not representative for G1 as it exercises a
pathology that could (and unfortunately must if it is really the case)
be resolved by a single command line switch by the user.
The G1 GC algorithm would need prior knowledge of the application it is
running to automatically resolve this.
Having had a look at G1 behavior, the reason for the low performance is
likely due to heap sizing heuristics issues, G1 does not expand the heap
as aggressively as ZGC.
The upside obviously is that it uses (much) less memory. ;)
More technical explanation:
* in presence of full collection ([0], in the process of being fixed
right now), G1 does not expand the heap, running with maybe half of what
ZGC uses. This is due to the behavior of the application.
* even if fixing the bug via some command line options
(-XX:MaxHeapFreeRatio=100), the short runtime of the application (i.e.
even with that fix applied, it takes to long to get to the same heap
size as ZGC for reasons we can discuss if you want.
* the mentioned issue with the large objects, i.e. G1 wasting too much
memory also contributes.
Interestingly, I only observed this on slower systems, these issues do
not show on faster ones, e.g. on some x64 workstation (limited to 10
threads). On that workstation, G1 is 2x faster than ZGC with the
settings you gave already. However on some smallish Aarch64 VM it is
around the same performance (slightly slower). This is probably what you
are seeing on your laptop (which may also experience aggressive
throttling without precautions).
TL;DR: If you set minimum heap size, and region size, G1 is 2x faster
than ZGC (with -Xms4g -Xmx4g -XX:G1HeapRegionSize=8m) on that slower
aarch64 machine too here.
(Fwiw, for maximum throughput we recommend to set minimum and maximum
heap size to the same value irrespective of the garbage collector, see
the recommendations in our performance guide [1]. It also describes the
issue with humongous objects. We are working on improving both issues
right now).
Another observation is that with ZGC, although overall throughput is
faster than with G1 in your original example, its allocation stalls are
in the range of hundreds of milliseconds, while G1 pauses are at most at
50ms. So the "experience" with that original web app may be better with
G1 even if it is slower overall :P (We do not recommend running latency
oriented programs at that cpu load level either way, but just noticing).
> * the stack I showed about finding a new empty page/region allocation is
> present in both cases (with jfr and without jfr). But in the case with
> jfr, it is much more wider : it takes much more samples.
Problems tend to exacerbate themselves, i.e. after a certain threshold
of allocation rate beyond what it can sustain, performance can quickly
(non-linearly) detoriate, e.g. because of the need to use of different
slower algorithms.
Without JFR I am already seeing that almost all GCs are caused by
allocation stalls. Adding to that will not help.
When looking around in ZGC logs a bit, with StartFlightRecording there
seems to be much more so-called in-place object movement (i.e. instead
of copying live objects to a new place, and then freeing the old now
empty space, the objects are moved "down" the heap to fill gaps), which
is a lot more expensive.
This shows in garbage collection pauses, changing from hundreds of ms to
seconds.
As mentioned above, it looks like just that little extra memory usage
causes ZGC to go into some very slow mode to free memory and avoid OOME.
Hth,
Thomas
[0] https://bugs.openjdk.org/browse/JDK-8238686
[1]
https://docs.oracle.com/en/java/javase/25/gctuning/garbage-first-garbage-collector-tuning.html
>
> Best regards,
>
> Fabrice
>
>
> Le 2026-01-12 14:18, Thomas Schatzl a écrit :
>
>> Hi,
>>
>> while not being able to answer the question about why using JFR
>> takes so much additional time, when reading about your benchmark setup
>> the following things came to my mind:
>>
>> * -XX:+UseCompressedOops for ZGC does nothing (ZGC does not support
>> compressed oops at all), and G1 will automatically use it. You can
>> leave it off.
>>
>> * G1 having a significantly worse throughput than ZGC is very rare:
>> even then the extent you show is quite large. Taking some of content
>> together (4g heap, Maps, huge string variables) indicates that you
>> might have run into a well-known pathology of G1 with large objects:
>> the application might waste up to 50% of your application due to these
>> humongous objects [0 <https://tschatzl.github.io/2021/11/15/heap-
>> regions-x-large.html>].
>> G1 might work better in JDK 26 too as some enhancement to some
>> particular case has been added. More is being worked on.
>>
>> TL;DR: Your application might run much better with a large(r)
>> G1HeapRegionSize setting. Or just upgrading to JDK 26.
>>
>> * While ZGC does not have that in some cases extreme memory wastage
>> for large allocations, there is still some. Adding JFR might just push
>> it over the edge (the stack you showed are about finding a new empty
>> page/region for allocation, failing to do so, doing a GC, stalling and
>> waiting).
>>
>> Hth,
>> Thomas
>>
>> [0] https://tschatzl.github.io/2021/11/15/heap-regions-x-large.html
>> <https://tschatzl.github.io/2021/11/15/heap-regions-x-large.html>
>>
>> On 11.01.26 19:23, Fabrice Bibonne wrote:
>>> Hi all,
>>>
>>> I would like to report a case where starting jfr for an application
>>> running with zgc causes a significant throughput degradation
>>> (compared to when JFR is not started).
>>>
>>> My context : I was writing a little web app to illustrate a case
>>> where the use of ZGC gives a better throughput than with G1. I
>>> benchmarked with grafana k6 my application running with G1 and my
>>> application running with ZGC : the runs with ZGC gave better
>>> throughputs. I wanted to go a bit further in explanation so I began
>>> again my benchmarks with JFR to be able to illustrate GC gains in
>>> JMC. When I ran my web app with ZGC+JFR, I noticed a significant
>>> throughput degradation in my benchmark (which was not the case with
>>> G1+JFR).
>>>
>>> Although I did not measure an increase in overhead as such, I still
>>> wanted to report this issue because the degradation in throughput
>>> with JFR is such that it would not be usable as is on a production
>>> service.
>>>
>>> I wrote a little application (not a web one) to reproduce the
>>> problem : the application calls a little conversion service 200 times
>>> with random numbers in parallel (to be like a web app in charge and
>>> to pressure GC). The conversion service (a method named
>>> `convertNumberToWords`) convert the number in a String looking for
>>> the String in a Map with the number as th key. In order to
>>> instantiate and destroy many objects at each call, the map is built
>>> parsing a huge String at each call. Application ends after 200 calls.
>>>
>>> Here are the step to reproduce :
>>> 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact
>>> <https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact> (be
>>> aware to be on branch jfr+zgc_impact)
>>> 2. Compile it (you must include numbers200k.zip in resources : it
>>> contains a 36 Mo text files whose contents are used to create the
>>> huge String variable)
>>> 3. in the root of repository :
>>> 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -
>>> classpath target/classes poc.java.perf.write.TestPerf #ZGC without JFR`
>>> 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -
>>> XX:StartFlightRecording -classpath target/classes
>>> poc.java.perf.write.TestPerf #ZGC with JFR`
>>> 4. The real time of the second run (with JFR) will be considerably
>>> higher than that of the first
>>>
>>> I ran these tests on my laptop :
>>> - Dell Inc. Latitude 5591
>>> - openSUSE Tumbleweed 20260108
>>> - Kernel : 6.18.3-1-default (64-bit)
>>> - 12 × Intel® Core™ i7-8850H CPU @ 2.60GHz
>>> - RAM 16 Gio
>>> - openjdk version "25.0.1" 2025-10-21
>>> - OpenJDK Runtime Environment (build 25.0.1+8-27)
>>> - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing)
>>> - many tabs opened in firefox !
>>>
>>> I also ran it in a container (eclipse-temurin:25) on my laptop and
>>> with a windows laptop and came to the same conclusions : here are the
>>> measurements from the container :
>>>
>>> | Run with | Real time (s) |
>>> |-----------|---------------|
>>> | ZGC alone | 7.473 |
>>> | ZGC + jfr | 25.075 |
>>> | G1 alone | 10.195 |
>>> | G1 + jfr | 10.450 |
>>>
>>>
>>> After all these tests I tried to run the app with an other profiler
>>> tool in order to understand where is the issue. I join the flamegraph
>>> when running jfr+zgc : for the worker threads of the ForkJoinPool of
>>> Stream, stack traces of a majority of samples have the same top lines :
>>> - PosixSemaphore::wait
>>> - ZPageAllocator::alloc_page_stall
>>> - ZPageAllocator::alloc_page_inner
>>> - ZPageAllocator::alloc_page
>>>
>>> So many thread seem to spent their time waiting in the method
>>> ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic
>>> tasks threads has also a few samples where it waits at
>>> ZPageAllocator::alloc_page_stall. I hope this will help you to find
>>> the issue.
>>>
>>> Thank you very much for reading this email until the end. I hope this
>>> is the good place for such a feedback. Let me know if I must report
>>> my problem elsewhere. Be free to ask me more questions if you need.
>>>
>>> Thank you all for this amazing tool !
>>>
>>>
More information about the hotspot-jfr-dev
mailing list