We are running a GC intensive application with allocation rate up to 1.5GB/s(4k requests/sec) on 32v cores instance (-Xms20G -Xmx20G). openjdk 17 2021-09-14 OpenJDK Runtime Environment (build 17+35-2724) OpenJDK 64-Bit Server VM (build 17+35-2724, mixed mode, sharing) During migration from java 15 to java 17 we see unstable behavior when zgc goes to infinite loop and doesn't return back to the normal state. We see the following behavior: - it runs perfectly(~30% better then on java 15) - 5-7GB used memory with 16% CPU utilization - something happens and CPU utilization jumps to 40%, used memory jumps to 19GB, zgc cycles time is increased in 10times - sometimes app works about 30min before the crash sometimes it's crashed instantly - if it's still alive memory is not released and the app works with used memory 18-20GB - we see allocation stalls - we see the following GC stats [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms - we see JVM starts massively unload classes (jvm_classes_unloaded_total counter keep increasing) . - we’ve managed to profile JVM in such ‘unstable’ state and see ZTask consuming a lot of cpu doing CompiledMethod::unload_nmethod_caches(bool) job. See attached image. We played with options: -XX:ZCollectionInterval=5 -XX:SoftMaxHeapSize=12G -XX:ZAllocationSpikeTolerance=4 -XX:MetaspaceReclaimPolicy=aggressive But it doesn't help. it just postpones this event. Only if we redirect traffic to another node it goes to normal. Regards, Anatoly Deyneka [image: image.png] [image: image.png] [image: image.png]