Will ZGC Allocation Stall block the whole application during the whole GC phase in JDK17?
Dear ZGC experts, We are using ZGC on JDK17 (AWS JDK17, OpenJDK 64-Bit Server VM (17.0.10+8-LTS) for linux-amd64 JRE (17.0.10+8-LTS), built on Feb 6 2024 19:58:14 by "jenkins" with gcc 7.3.1 20180303 (Red Hat 7.3.1-5)), from JFR, we found there are several consecutive Allocation Stall event, high GC duration (8s+) but low pause(<< 1ms), and our health check failed (2s timeout) during that period, no safepoint begin(default 10ms threshold) JFR event. I am curious, will ZGC allocation stall block the application to serve health check requests during the whole phase (8s+)? Otherwise I can't understand why the health check will fail? Really appreciate ur great help in advance! [image: image.png] [image: image.png] [image: image.png] Thanks Roy
Hi Roy, On 2024-08-27 11:44, Roy Zhang wrote:
Dear ZGC experts,
We are using ZGC on JDK17 (AWS JDK17, OpenJDK 64-Bit Server VM (17.0.10+8-LTS) for linux-amd64 JRE (17.0.10+8-LTS), built on Feb 6 2024 19:58:14 by "jenkins" with gcc 7.3.1 20180303 (Red Hat 7.3.1-5)), from JFR, we found there are several consecutive Allocation Stall event, high GC duration (8s+) but low pause(<< 1ms), and our health check failed (2s timeout) during that period, no safepoint begin(default 10ms threshold) JFR event.
The events you shared are the Garbage Collection JFR events following an allocation stall, not the stalls themselves. There is a specific event named jdk.ZAllocationStall that, if enabled, provide more information about the actual allocation stalls. Those will show you what threads are being stalled and for how long.
I am curious, will ZGC allocation stall block the application to serve health check requests during the whole phase (8s+)? Otherwise I can't understand why the health check will fail?
The allocation stalls are per thread, so when an application threads needs to allocate and it can't be satisfied it will stall. This will also kick off a garbage collection with the "Allocation Stall" cause (the ones you see). Other threads can continue to run until they need to do an allocation that can't be satisfied (which hopefully can be avoided by the GC freeing up memory). You should also keep in mind that when running into allocation stalls, ZGC is not able to fully keep up with the allocation pressure. You should look at tuning your instance to avoid this. You can try increasing the heap size or allow ZGC to use more concurrent GC threads. An even better alternative, if possible, would be to try out JDK 21 and use generational ZGC (-XX:+ZGenerational). The generational version of ZGC can usually handle high allocation pressure better than single generational ZGC. Hope this helps, StefanJ
Really appreciate ur great help in advance!
image.png image.png image.png
Thanks Roy
Thanks Stefan for letting me know all the details, really appreciated! On Thu, Aug 29, 2024 at 10:53 AM Stefan Johansson < stefan.johansson@oracle.com> wrote:
Hi Roy,
On 2024-08-27 11:44, Roy Zhang wrote:
Dear ZGC experts,
We are using ZGC on JDK17 (AWS JDK17, OpenJDK 64-Bit Server VM (17.0.10+8-LTS) for linux-amd64 JRE (17.0.10+8-LTS), built on Feb 6 2024 19:58:14 by "jenkins" with gcc 7.3.1 20180303 (Red Hat 7.3.1-5)), from JFR, we found there are several consecutive Allocation Stall event, high GC duration (8s+) but low pause(<< 1ms), and our health check failed (2s timeout) during that period, no safepoint begin(default 10ms threshold) JFR event.
The events you shared are the Garbage Collection JFR events following an allocation stall, not the stalls themselves. There is a specific event named jdk.ZAllocationStall that, if enabled, provide more information about the actual allocation stalls. Those will show you what threads are being stalled and for how long.
I am curious, will ZGC allocation stall block the application to serve health check requests during the whole phase (8s+)? Otherwise I can't understand why the health check will fail?
The allocation stalls are per thread, so when an application threads needs to allocate and it can't be satisfied it will stall. This will also kick off a garbage collection with the "Allocation Stall" cause (the ones you see). Other threads can continue to run until they need to do an allocation that can't be satisfied (which hopefully can be avoided by the GC freeing up memory).
You should also keep in mind that when running into allocation stalls, ZGC is not able to fully keep up with the allocation pressure. You should look at tuning your instance to avoid this. You can try increasing the heap size or allow ZGC to use more concurrent GC threads. An even better alternative, if possible, would be to try out JDK 21 and use generational ZGC (-XX:+ZGenerational). The generational version of ZGC can usually handle high allocation pressure better than single generational ZGC.
Hope this helps, StefanJ
Really appreciate ur great help in advance!
image.png image.png image.png
Thanks Roy
participants (2)
-
Roy Zhang
-
Stefan Johansson