RFR: 6203: Add ZGC allocation stall rule
Jean-Philippe Bempel
jpbempel at openjdk.org
Wed Jul 16 12:00:50 UTC 2025
On Wed, 9 Jul 2025 14:17:10 GMT, Suchita Chaturvedi <schaturvedi at openjdk.org> wrote:
> This enhancement is to add new rule for ZGC Allocation Stall events.
>
> The default configuration:
>
> <img width="946" alt="image" src="https://github.com/user-attachments/assets/0d39ae26-fdc4-49e3-a0ed-fb7f7da8f709" />
>
> Here are few screenshots for reference:
>
> <img width="344" alt="image" src="https://github.com/user-attachments/assets/e7efe1e2-d6b3-4a05-8ea6-1bf2e8b8c15f" />
>
> <img width="353" alt="image" src="https://github.com/user-attachments/assets/a3d3862f-96d2-4292-947f-4562c7f9f3d3" />
>
> <img width="344" alt="image" src="https://github.com/user-attachments/assets/9616eea6-bce8-4395-a846-db343fe349f2" />
>
> <img width="341" alt="image" src="https://github.com/user-attachments/assets/376a100e-ac05-48fe-a9c3-754923c3d79e" />
>
> <img width="353" alt="image" src="https://github.com/user-attachments/assets/cfa7c5f1-08f0-44b4-a49d-28504439a631" />
>
> Ignored
>
> <img width="352" alt="image" src="https://github.com/user-attachments/assets/3c00c22e-e64d-46b0-8b64-ed653ed1fc4b" />
>
> If we change default configuration as below:
>
> <img width="367" alt="image" src="https://github.com/user-attachments/assets/8da64985-8446-44b8-bfb4-696a726b5255" />
>
> <img width="350" alt="image" src="https://github.com/user-attachments/assets/ad89f02e-39b4-4df4-aae7-43cec9cc9a91" />
>
> <img width="347" alt="image" src="https://github.com/user-attachments/assets/b3ec1081-26dd-4391-9e0f-0b4c64a7e3c1" />
>
> <img width="344" alt="image" src="https://github.com/user-attachments/assets/106a7c3e-aaaf-495a-8c6f-281c96371e6e" />
core/org.openjdk.jmc.flightrecorder.rules.jdk/src/main/resources/org/openjdk/jmc/flightrecorder/rules/jdk/messages/internal/messages.properties line 755:
> 753: VMOperationRuleFactory_TEXT_WARN_LONG_COMBINED_DURATION=There are long lasting blocking VM operations in this recording. The longest was created from multiple close consecutive operations that were of type {longestOperation} and lasted for {longestOperationDuration} in total. They were initiated from thread {longestOperationCaller} and started at {longestOperationStartTime}. VM operations are JVM internal operations. Some VM operations are executed synchronously (i.e. will block the calling thread), and some need to be executed at so called safe points. Safe point polling is a cooperative suspension mechanism that halts byte code execution in the JVM. A VM operation occurring at a safe point will effectively be "stopping the world", meaning that no Java code will be executing in any thread while executing VM operations at that safe point. Long lasting VM operations executing at safe points can decrease the responsiveness of an application. If you do find such VM operations, th
en the type of operation and its caller thread provide vital information to understand why the VM operation happened. To find more details, check if there is an event in the caller thread intersecting this event time wise. Looking at the stack trace for such an event can help determining what caused it. See [Runtime Overview](http://openjdk.java.net/groups/hotspot/docs/RuntimeOverview.html) for further information.
> 754: ZGCAllocationStall_RULE_NAME=ZGC Allocation Stall
> 755: ZgcAllocationStall_TEXT_INFO=In ZGC, a type of concurrent Garbage Collection (GC) algorithm, GC threads run concurrently with application threads, resulting in minimal stop-the-world pauses. However, because these pauses are so brief, application threads may create objects faster than GC threads can reclaim memory. In such cases, the JVM temporarily stops the application threads from creating new objects. This 'stopping of object creation' is known as an "Allocation Stall." \n Allocation Stall occurs due to following reasons: \n 1. Inefficient GC Algorithm: This is often the primary cause of Allocation Stall. Using a non-optimal GC algorithm or improper GC settings for your application's workload can lead to stalling. Earlier versions of ZGC (i.e. single-generation ZGC algorithm), are more prone to Allocation Stalls. \n 2. High Object Allocation Rate: If your application creates objects at a very high rate, it can overwhelm the GC's ability to reclaim memory quickly enough, le
ading to stalls.\n 3. Memory Fragmentation: Even if there is free memory, fragmentation in the heap can prevent large objects from being allocated, contributing to Allocation Stalls.\n
> because these pauses are so brief, application threads may create objects faster than GC threads can reclaim memory.
This is not because the pauses are brief that we have stalls. it's just because app threads can create objects faster than the GC cycle is able to release the memory. Purely technically allocating is very cheap, while reclaiming requires to traverse object graph, move objects, plus sometimes less GC threads than app threads allocating.
> Allocation Stall occurs due to following reasons: \n 1. Inefficient GC Algorithm: This is often the primary cause of Allocation Stall. Using a non-optimal GC algorithm or improper GC settings for your application's workload can lead to stalling.
I would write it like this. I prefer to say GC algorithm requires more time to reclaim, and therefore we need either more GC threads and/or more room in the JAva Heap to have more time to reclaim before reaching the heap limit.
Non-generational have a GC cycle longer than generational which contribute to having more stalls (the whole object graph needs to be scan before trying to reclaim memory, while only a fraction for generational). so improper GC settings that's true but I would suggest heap sizing, GC threads, or switching to generational.
> 3. Memory Fragmentation: Even if there is free memory, fragmentation in the heap can prevent large objects from being allocated, contributing to Allocation Stalls.
even if technically it's possible, I am not sure this is a main issue with ZGC.
-------------
PR Review Comment: https://git.openjdk.org/jmc/pull/664#discussion_r2210166822
More information about the jmc-dev
mailing list