Hi all, I would like to offer my thanks for your work in creating such an impressive tool for the JVM. We recently integrated ZGC into our Java RPC at Canva and it yielded marked improvements in latency. GC pauses are now effectively non-existent on the service and this has allowed us to push the envelope further on the SLA's we can offer our clients. To give some context, the service is an RPC instance running inside the Finagle RPC framework. The RPC runs in a container ECS environment with 4 GiB of memory and 1vCPU. The instance runs with the following OpenJDK 13.0.8 Zulu JVM options enabled: "-XX:InitialRAMPercentage=75", "-XX:MaxRAMPercentage=75", "-XX:StartFlightRecording:maxsize=100M,settings=profile,filename=/tmp/record.jfr,dumponexit=true", "-XX:-UseBiasedLocking", "-XX:+UnlockExperimentalVMOptions", "-XX:+UseZGC", Additionally, the GC was further tuned to deal with an issue that I will discuss later in the message. These were added based on the advice provided to Sergey Tselovalnikov in "Experience with ZGC <https://mail.openjdk.java.net/pipermail/zgc-dev/2020-March/000880.html>" . "-XX:ZAllocationSpikeTolerance=3", "-XX:-ZUncommit", "-XX:SoftMaxHeapSize=2G", Based on these options the JVM is allocated from the container's memory: * Heap 3 gb * Metaspace 102 mb * Total 3.1 gb Each instance at peak processes around 230 requests per second. These requests involve accepting requests and logging them to a persistence layer after some minor transformations using Jackson. In general, each request can create a lot of objects and is very short-lived, since the process performs the transform using a few intermediate objects. This leads to an average consistent creation rate around 112.47 mb/sec. However, similar to the problem Sergey mentioned in his message, we also have faced issues with ZGC in this environment even after providing the suggested tuning options. The issue appears to be that ZGC spends a large portion of time, infrequently, in "Subphase: Concurrent Classes Unload" and uses the majority of the containers CPU allocation during these phases. Often times there will be sequentially long phases that slow the instance down for 15-30 seconds causing an increase in latency. I've provided two example times from the GC Logs and can provide more if needed. [2021-08-24T14:17:45.464+0000][info][gc,stats ] Subphase: Concurrent Classes Unload 13082.074 / 13082.074 392.443 / 13082.074 272.530 / 28215.293 272.530 / 28215.293 ms [2021-08-24T14:17:45.464+0000][info][gc,stats ] Subphase: Concurrent Mark 410.997 / 410.997 147.171 / 410.997 154.437 / 610.867 154.437 / 610.867 ms These long phases lead to allocation stall as well as a lot of "ICBufferFull" logs during the subphase: [2021-08-24T14:17:40.898+0000][info][gc ] Allocation Stall (server-1) 372.703ms The average time taken for the phase is usually much lower, usually around 350 ms. I am hoping there is a lever I can pull to maybe increase the overall average time used for this phase, to then decrease the maximum time taken for this phase. Our current solution is to provide a bigger overhead of CPU to reduce these phases and investigate how we can reduce the allocation rate to alleviate this problem. I am happy to provide further information and logs on the scenario if they will help. It has been a great experience to dig deeper into this issue and learn more about the JVM and ZGC and I look forward to continuing this learning. Thanks for working on this awesome GC! Cheers, Jack -- ** ** <https://www.canva.com/>Empowering the world to design Share accurate information on COVID-19 and spread messages of support to your community. Here are some resources <https://about.canva.com/coronavirus-awareness-collection/?utm_medium=pr&utm_source=news&utm_campaign=covid19_templates> that can help. <https://twitter.com/canva> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> <https://twitter.com/canva> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> <https://instagram.com/canva>