ZGC unstable behaviour on java 17

Anatoly Deyneka adeyneka at gmail.com
Tue Nov 9 13:12:49 UTC 2021


Hi Erik,

Probably we haven't reached this condition on java 15.
We see a big difference in metaspace allocation starting from java 16.
The behaviour is not changed for any option of -XX:MetaspaceReclaimPolicy

Here is all our knowledge about the problem:
https://github.com/adeyneka/ZGC-java-16-17-unstable-behaviour/blob/main/README.md

Unfortunately we can reproduce this issue only on our production platform.
If we can provide more details or logs please let me know.

Regards,
Anatoly Deyneka

On Mon, 8 Nov 2021 at 16:46, Erik Osterlund <erik.osterlund at oracle.com>
wrote:

> Hi Anatoly,
>
> Thanks for reporting this. This looks rather odd indeed. Unfortunately,
> the attached images have been stripped from your original email.
>
> It’s particularly interesting that you see this behaviour when migrating
> from JDK 15 to 17. There have been a few bug fixes, but nothing I can
> easily imagine causing this. I have been surprised before though. One
> notable thing in the class unloading path that has changed though, is the
> elastic metaspace. I see that you use the
> -XX:MetaspaceReclaimPolicy=aggressive option, which did not exist in JDK15.
> This makes me wonder if this reproduces also without that option, i.e. the
> way that you run the workload is more similar.
>
> Anyway, is there any way for me to get access to try to reproduce your
> issue?
>
> Thanks,
> /Erik
>
> > On 4 Nov 2021, at 17:00, Anatoly Deyneka <adeyneka at gmail.com> wrote:
> >
> > We are running a GC intensive application with allocation rate up to
> > 1.5GB/s(4k requests/sec) on 32v cores instance (-Xms20G -Xmx20G).
> > openjdk 17 2021-09-14
> > OpenJDK Runtime Environment (build 17+35-2724)
> > OpenJDK 64-Bit Server VM (build 17+35-2724, mixed mode, sharing)
> >
> > During migration from java 15 to java 17 we see unstable behavior when
> zgc
> > goes to infinite loop and doesn't return back to the normal state.
> > We see the following behavior:
> > - it runs perfectly(~30% better then on java 15) - 5-7GB used memory with
> > 16% CPU utilization
> > - something happens and CPU utilization jumps to 40%, used memory jumps
> to
> > 19GB, zgc cycles time is increased in 10times
> > - sometimes app works about 30min before the crash sometimes it's crashed
> > instantly
> > - if it's still alive memory is not released and the app works with used
> > memory 18-20GB
> > - we see allocation stalls
> > - we see the following GC stats
> > [17606.140s][info][gc,phases   ] GC(719) Concurrent Process Non-Strong
> > References 25781.928ms
> > [17610.181s][info][gc,stats    ] Subphase: Concurrent Classes Unlink
> > 14280.772 / 25769.511  1126.563 / 25769.511   217.882 / 68385.750
>  217.882
> > / 68385.750   ms
> > -  we see JVM starts massively unload classes (jvm_classes_unloaded_total
> > counter keep increasing) .
> > - we’ve managed to profile JVM in such ‘unstable’ state and see ZTask
> > consuming a lot of cpu doing CompiledMethod::unload_nmethod_caches(bool)
> > job. See attached image.
> >
> > We played with options:
> > -XX:ZCollectionInterval=5
> > -XX:SoftMaxHeapSize=12G
> > -XX:ZAllocationSpikeTolerance=4
> > -XX:MetaspaceReclaimPolicy=aggressive
> >
> > But it doesn't help. it just postpones this event.
> > Only if we redirect traffic to another node it goes to normal.
> >
> > Regards,
> > Anatoly Deyneka
> >
> > [image: image.png]
> > [image: image.png]
> > [image: image.png]
>


More information about the zgc-dev mailing list