ZGC unstable behaviour on java 17

4 Nov 2021

      We are running a GC intensive application with allocation rate up to
1.5GB/s(4k requests/sec) on 32v cores instance (-Xms20G -Xmx20G).
openjdk 17 2021-09-14
OpenJDK Runtime Environment (build 17+35-2724)
OpenJDK 64-Bit Server VM (build 17+35-2724, mixed mode, sharing)

During migration from java 15 to java 17 we see unstable behavior when zgc
goes to infinite loop and doesn't return back to the normal state.
We see the following behavior:
- it runs perfectly(~30% better then on java 15) - 5-7GB used memory with
16% CPU utilization
- something happens and CPU utilization jumps to 40%, used memory jumps to
19GB, zgc cycles time is increased in 10times
- sometimes app works about 30min before the crash sometimes it's crashed
instantly
- if it's still alive memory is not released and the app works with used
memory 18-20GB
- we see allocation stalls
- we see the following GC stats
 [17606.140s][info][gc,phases   ] GC(719) Concurrent Process Non-Strong
References 25781.928ms
[17610.181s][info][gc,stats    ] Subphase: Concurrent Classes Unlink
14280.772 / 25769.511  1126.563 / 25769.511   217.882 / 68385.750   217.882
/ 68385.750   ms
-  we see JVM starts massively unload classes (jvm_classes_unloaded_total
counter keep increasing) .
- we’ve managed to profile JVM in such ‘unstable’ state and see ZTask
consuming a lot of cpu doing CompiledMethod::unload_nmethod_caches(bool)
job. See attached image.

We played with options:
 -XX:ZCollectionInterval=5
-XX:SoftMaxHeapSize=12G
 -XX:ZAllocationSpikeTolerance=4
-XX:MetaspaceReclaimPolicy=aggressive

But it doesn't help. it just postpones this event.
Only if we redirect traffic to another node it goes to normal.

Regards,
Anatoly Deyneka

[image: image.png]
[image: image.png]
[image: image.png]