RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GenZGC performance

Fri Nov 21 23:02:07 UTC 2025

On Thu, 20 Nov 2025 21:23:16 GMT, Dean Long <dlong at openjdk.org> wrote:

> It seems a little disruptive to have to pass `defer_icache_invalidation` around so much. What about attaching this information to the Thread or using a THREAD_LOCAL?

I switched to a THREAD_LOCAL. Initially it regressed fullGG comparing to the version with the parameter:
- Parameter:

Benchmark                       (accessedFieldCount)  (methodCount)  Mode  Cnt    Score    Error  Units
GCPatchingNmethodCost.fullGC                       0           5000  avgt    3   88.865 ± 19.299  ms/op
GCPatchingNmethodCost.fullGC                       2           5000  avgt    3  146.184 ± 11.531  ms/op
GCPatchingNmethodCost.fullGC                       4           5000  avgt    3  186.429 ± 16.257  ms/op
GCPatchingNmethodCost.fullGC                       8           5000  avgt    3  262.933 ± 13.071  ms/op

- THREAD_LOCAL

Benchmark                       (accessedFieldCount)  (methodCount)  Mode  Cnt    Score     Error  Units
GCPatchingNmethodCost.fullGC                       0           5000  avgt    3   93.899 ±  14.870  ms/op
GCPatchingNmethodCost.fullGC                       2           5000  avgt    3  152.872 ±  13.566  ms/op
GCPatchingNmethodCost.fullGC                       4           5000  avgt    3  194.425 ±  37.851  ms/op
GCPatchingNmethodCost.fullGC                       8           5000  avgt    3  271.826 ±  47.908  ms/op

I found that `ZBarrierSetAssembler::patch_barrier_relocation` is only used when icache invalidation is deferred. I replaced a check of the thread local value with a check of `NeoverseN1Errata1542419`. This restored the performance:

Benchmark                       (accessedFieldCount)  (methodCount)  Mode  Cnt    Score     Error  Units
GCPatchingNmethodCost.fullGC                       0           5000  avgt    3   84.919 ±  31.411  ms/op
GCPatchingNmethodCost.fullGC                       2           5000  avgt    3  141.862 ±   7.026  ms/op
GCPatchingNmethodCost.fullGC                       4           5000  avgt    3  184.921 ±  46.592  ms/op
GCPatchingNmethodCost.fullGC                       8           5000  avgt    3  263.897 ± 48.271  ms/op

It might be that accesses to THREAD_LOCAL on Neoverse N1 are expensive.

Should I try attaching info to Thread?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3564915607