Using JFR both with ZGC degrades application throughput

Tue Jan 13 10:06:07 UTC 2026

Hi,

On 13.01.26 05:36, Fabrice Bibonne wrote:
> Thank you for your advise, I just give a few precisions in a few lines  :
> 
> * for `-XX:+UseCompressedOops`, I must admit I do not know this option : 
> I add it because JDK Mission control warned me about it in "Automated 
> analysis result" after a fisrt try (<<Compressed Oops is turned off 
> [...].Use the JVM argument '-XX:+UseCompressedOops' to enable this 
> feature>>)

Maybe JMC should not provide this hint for ZGC then (not directed 
towards you).

> 
> * it is true that application waste time in GC pauses (46,6% of time 
> with G1) : I wanted an example app which uses GC a lot. Maybe this is a 
> little too much compared to real apps (even if for some of them, we may 
> wonder...).

What I am saying is that while the results are as they are for you, I 
suspect that the result is not representative for G1 as it exercises a 
pathology that could (and unfortunately must if it is really the case) 
be resolved by a single command line switch by the user.

The G1 GC algorithm would need prior knowledge of the application it is 
running to automatically resolve this.

Having had a look at G1 behavior, the reason for the low performance is 
likely due to heap sizing heuristics issues, G1 does not expand the heap 
as aggressively as ZGC.

The upside obviously is that it uses (much) less memory. ;)

More technical explanation:

* in presence of full collection ([0], in the process of being fixed 
right now), G1 does not expand the heap, running with maybe half of what 
ZGC uses. This is due to the behavior of the application.

* even if fixing the bug via some command line options 
(-XX:MaxHeapFreeRatio=100), the short runtime of the application (i.e. 
even with that fix applied, it takes to long to get to the same heap 
size as ZGC for reasons we can discuss if you want.

* the mentioned issue with the large objects, i.e. G1 wasting too much 
memory also contributes.

Interestingly, I only observed this on slower systems, these issues do 
not show on faster ones, e.g. on some x64 workstation (limited to 10 
threads). On that workstation, G1 is 2x faster than ZGC with the 
settings you gave already. However on some smallish Aarch64 VM it is 
around the same performance (slightly slower). This is probably what you 
are seeing on your laptop (which may also experience aggressive 
throttling without precautions).

TL;DR: If you set minimum heap size, and region size, G1 is 2x faster 
than ZGC (with -Xms4g -Xmx4g -XX:G1HeapRegionSize=8m) on that slower 
aarch64 machine too here.

(Fwiw, for maximum throughput we recommend to set minimum and maximum 
heap size to the same value irrespective of the garbage collector, see 
the recommendations in our performance guide [1]. It also describes the 
issue with humongous objects. We are working on improving both issues 
right now).

Another observation is that with ZGC, although overall throughput is 
faster than with G1 in your original example, its allocation stalls are 
in the range of hundreds of milliseconds, while G1 pauses are at most at 
50ms. So the "experience" with that original web app may be better with 
G1 even if it is slower overall :P (We do not recommend running latency 
oriented programs at that cpu load level either way, but just noticing).

> * the stack I showed about finding a new empty page/region allocation is 
> present in both cases (with jfr and without jfr). But in the case with 
> jfr, it is much more wider : it takes much more samples.

Problems tend to exacerbate themselves, i.e. after a certain threshold 
of allocation rate beyond what it can sustain, performance can quickly 
(non-linearly) detoriate, e.g. because of the need to use of different 
slower algorithms.

Without JFR I am already seeing that almost all GCs are caused by 
allocation stalls. Adding to that will not help.

When looking around in ZGC logs a bit, with StartFlightRecording there 
seems to be much more so-called in-place object movement (i.e. instead 
of copying live objects to a new place, and then freeing the old now 
empty space, the objects are moved "down" the heap to fill gaps), which 
is a lot more expensive.

This shows in garbage collection pauses, changing from hundreds of ms to 
seconds.

As mentioned above, it looks like just that little extra memory usage 
causes ZGC to go into some very slow mode to free memory and avoid OOME.

Hth,
   Thomas

[0] https://bugs.openjdk.org/browse/JDK-8238686
[1] 
https://docs.oracle.com/en/java/javase/25/gctuning/garbage-first-garbage-collector-tuning.html

> 
> Best regards,
> 
> Fabrice
> 
> 
> Le 2026-01-12 14:18, Thomas Schatzl a écrit :
> 
>> Hi,
>>
>>   while not being able to answer the question about why using JFR 
>> takes so much additional time, when reading about your benchmark setup 
>> the following things came to my mind:
>>
>> * -XX:+UseCompressedOops for ZGC does nothing (ZGC does not support 
>> compressed oops at all), and G1 will automatically use it. You can 
>> leave it off.
>>
>> * G1 having a significantly worse throughput than ZGC is very rare: 
>> even then the extent you show is quite large. Taking some of content 
>> together (4g heap, Maps, huge string variables) indicates that you 
>> might have run into a well-known pathology of G1 with large objects: 
>> the application might waste up to 50% of your application due to these 
>> humongous objects [0 <https://tschatzl.github.io/2021/11/15/heap- 
>> regions-x-large.html>].
>> G1 might work better in JDK 26 too as some enhancement to some 
>> particular case has been added. More is being worked on.
>>
>> TL;DR: Your application might run much better with a large(r) 
>> G1HeapRegionSize setting. Or just upgrading to JDK 26.
>>
>> * While ZGC does not have that in some cases extreme memory wastage 
>> for large allocations, there is still some. Adding JFR might just push 
>> it over the edge (the stack you showed are about finding a new empty 
>> page/region for allocation, failing to do so, doing a GC, stalling and 
>> waiting).
>>
>> Hth,
>>   Thomas
>>
>> [0] https://tschatzl.github.io/2021/11/15/heap-regions-x-large.html 
>> <https://tschatzl.github.io/2021/11/15/heap-regions-x-large.html>
>>
>> On 11.01.26 19:23, Fabrice Bibonne wrote:
>>> Hi all,
>>>
>>>   I would like to report a case where starting jfr for an application 
>>> running with zgc causes a significant throughput degradation 
>>> (compared to when JFR is not started).
>>>
>>>   My context : I was writing a little web app to illustrate a case 
>>> where the use of ZGC gives a better throughput than with G1. I 
>>> benchmarked with grafana k6 my application running with G1 and my 
>>> application running with ZGC  : the runs with ZGC gave better 
>>> throughputs. I wanted to go a bit further in explanation so I began 
>>> again my benchmarks with JFR to be able to illustrate GC gains in 
>>> JMC. When I ran my web app with ZGC+JFR, I noticed a significant 
>>> throughput degradation in my benchmark (which was not the case with 
>>> G1+JFR).
>>>
>>>   Although I did not measure an increase in overhead as such, I still 
>>> wanted to report this issue because the degradation in throughput 
>>> with JFR is such that it would not be usable as is on a production 
>>> service.
>>>
>>> I wrote a little application (not a web one) to reproduce the 
>>> problem : the application calls a little conversion service 200 times 
>>> with random numbers in parallel (to be like a web app in charge and 
>>> to pressure GC). The conversion service (a method named 
>>> `convertNumberToWords`) convert the number in a String looking for 
>>> the String in a Map with the number as th key. In order to 
>>> instantiate and destroy many objects at each call, the map is built 
>>> parsing a huge String at each call. Application ends after 200 calls.
>>>
>>> Here are the step to reproduce :
>>> 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact 
>>> <https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact> (be 
>>> aware to be on branch jfr+zgc_impact)
>>> 2. Compile it (you must include numbers200k.zip in resources : it 
>>> contains a 36 Mo text files whose contents are used to create the 
>>> huge String variable)
>>> 3. in the root of repository :
>>> 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops - 
>>> classpath target/classes poc.java.perf.write.TestPerf #ZGC without JFR`
>>> 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops - 
>>> XX:StartFlightRecording -classpath target/classes 
>>> poc.java.perf.write.TestPerf #ZGC with JFR`
>>> 4. The real time of the second run (with JFR) will be considerably 
>>> higher than that of the first
>>>
>>> I ran these tests on my laptop :
>>> - Dell Inc. Latitude 5591
>>> - openSUSE Tumbleweed 20260108
>>> - Kernel : 6.18.3-1-default (64-bit)
>>> - 12 × Intel® Core™ i7-8850H CPU @ 2.60GHz
>>> - RAM 16 Gio
>>> - openjdk version "25.0.1" 2025-10-21
>>> - OpenJDK Runtime Environment (build 25.0.1+8-27)
>>> - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing)
>>> - many tabs opened in firefox !
>>>
>>> I also ran it in a container (eclipse-temurin:25) on my laptop and 
>>> with a windows laptop and came to the same conclusions : here are the 
>>> measurements from the container :
>>>
>>> | Run with  | Real time (s) |
>>> |-----------|---------------|
>>> | ZGC alone | 7.473         |
>>> | ZGC + jfr | 25.075        |
>>> | G1 alone  | 10.195        |
>>> | G1 + jfr  | 10.450        |
>>>
>>>
>>> After all these tests I tried to run the app with an other profiler 
>>> tool in order to understand where is the issue. I join the flamegraph 
>>> when running jfr+zgc : for the worker threads of the ForkJoinPool of 
>>> Stream, stack traces of a majority of samples have the same top lines :
>>> - PosixSemaphore::wait
>>> - ZPageAllocator::alloc_page_stall
>>> - ZPageAllocator::alloc_page_inner
>>> - ZPageAllocator::alloc_page
>>>
>>> So many thread seem to spent their time waiting in the method 
>>> ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic 
>>> tasks threads has also a few samples where it waits at 
>>> ZPageAllocator::alloc_page_stall. I hope this will help you to find 
>>> the issue.
>>>
>>> Thank you very much for reading this email until the end. I hope this 
>>> is the good place for such a feedback. Let me know if I must report 
>>> my problem elsewhere. Be free to ask me more questions if you need.
>>>
>>> Thank you all for this amazing tool !
>>>
>>>