RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GenZGC performance [v2]

Mon Nov 24 15:49:29 UTC 2025

On Sun, 23 Nov 2025 14:34:44 GMT, Andrew Haley <aph at openjdk.org> wrote:

> > > If I understand correctly, the whole icache is flushed, so the actual nmethod* is irrelevant.  So instead of         `ICacheInvalidationContext icic(nm)` for every different "nm", can't we just do `ICacheInvalidationContext icic(true)` one time, outside the nmethod loop?
> > 
> > 
> > We can't disarm an nmethod before flushing the instructions.

I don't think we flush the whole icache. We invalidate all translation entries (all VAs, all possible levels). I have not found any information that it would flush icache. I think TLBI is used as a heavyweight serialization barrier. It might force cores to synchronize their instruction fetch streams. We broadcast a TLB invalidation and wait for its completion. I think hardware data and instruction cache coherence still work.   

I also found https://lore.kernel.org/linux-arm-kernel/20191017174300.29770-1-james.morse@arm.com/ with more details on the errata workaround. These details look aligned with the hypothesis of a synchronization event to enforce ordering. 

The problem is:
> Neoverse-N1 cores with the 'COHERENT_ICACHE' feature may fetch stale
instructions when software depends on prefetch-speculation-protection
instead of explicit synchronization.

Prefetch-speculation-protection:
> JIT can generate new instructions at some new location, then update a
> branch in the executable instructions to point at the new location.
> 
> Prefetch-speculation-protection guarantees that if another CPU sees
> the new branch, it also sees the new instructions that were written
> there.

I think, in the case of armed/disarmed nmethods we have explicit synchronization not prefetch-speculation-protection. Neither of thread execute armed nmethods. If I am correct, disarming is a process of releasing nmethod to allow its execution.

> 
> Sure, but you can't patch an nmethod until every thread that might be executing it has stopped. So if the threads are all stopped, why not postpone the disarmament until the end, just before you flush?

If my understanding is correct, we cannot disarm before flushing because disarming is like a release of a critical section. We must guarantee all changes we've made are visible to all observers when we leave the critical section.

As I wrote in the JBS issue we can:
- Get all nmethods armed
- Patch all of them 
- Invalidate TLB
- Get all nmethods disarmed

This will complicate the fix a lot. Performance gain from is not worth. I measured theoretic performance when we don't do any invalidation. It's 3% - 4% better than the approach in this PR: invalidate TLB per nmethod.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3571471651