ZGC unstable behaviour on java 17
Anatoly Deyneka
adeyneka at gmail.com
Thu Nov 4 16:00:33 UTC 2021
We are running a GC intensive application with allocation rate up to
1.5GB/s(4k requests/sec) on 32v cores instance (-Xms20G -Xmx20G).
openjdk 17 2021-09-14
OpenJDK Runtime Environment (build 17+35-2724)
OpenJDK 64-Bit Server VM (build 17+35-2724, mixed mode, sharing)
During migration from java 15 to java 17 we see unstable behavior when zgc
goes to infinite loop and doesn't return back to the normal state.
We see the following behavior:
- it runs perfectly(~30% better then on java 15) - 5-7GB used memory with
16% CPU utilization
- something happens and CPU utilization jumps to 40%, used memory jumps to
19GB, zgc cycles time is increased in 10times
- sometimes app works about 30min before the crash sometimes it's crashed
instantly
- if it's still alive memory is not released and the app works with used
memory 18-20GB
- we see allocation stalls
- we see the following GC stats
[17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong
References 25781.928ms
[17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink
14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882
/ 68385.750 ms
- we see JVM starts massively unload classes (jvm_classes_unloaded_total
counter keep increasing) .
- we’ve managed to profile JVM in such ‘unstable’ state and see ZTask
consuming a lot of cpu doing CompiledMethod::unload_nmethod_caches(bool)
job. See attached image.
We played with options:
-XX:ZCollectionInterval=5
-XX:SoftMaxHeapSize=12G
-XX:ZAllocationSpikeTolerance=4
-XX:MetaspaceReclaimPolicy=aggressive
But it doesn't help. it just postpones this event.
Only if we redirect traffic to another node it goes to normal.
Regards,
Anatoly Deyneka
[image: image.png]
[image: image.png]
[image: image.png]
More information about the zgc-dev
mailing list