RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GenZGC performance [v3]

Mon Nov 24 21:00:30 UTC 2025

On Mon, 24 Nov 2025 17:49:24 GMT, Evgeny Astigeevich <eastigeevich at openjdk.org> wrote:

>> Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1.
>>  
>> Neoverse-N1 implementations mitigate erratum 1542419 with a workaround:
>> - Disable coherent icache.
>> - Trap IC IVAU instructions.
>> - Execute:
>>    - `tlbi vae3is, xzr`
>>    - `dsb sy`
>>  
>>  `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address).  It waits for all memory accesses using in-scope old translation information to complete before it is considered complete.
>>  
>> As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests:
>> 
>> "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround."
>> 
>> This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions.
>> 
>> Changes include:
>> 
>> * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization.
>> * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address.
>> * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures.
>> * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk.
>> 
>> Benchmarking results: Neoverse-N1 r3p1 (Graviton 2)
>> 
>> - Baseline
>> 
>> $ taskset -c 0-3 java -Xbootclasspath/a:./wb.jar -XX:+UnlockDiagnosticVMOptions -XX:-NeoverseN1...
>
> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Explicitly check Neoverse N1 revisions affected by errata 1542419

Yeah patching all nmethods as one unit is basically equivalent to making the code cache processing a STW operation. Last time we processed the code cache STW was JDK 11. A dark place I don't want to go back to. It can get pretty big and mess up latency. So I'm in favour of limiting the fix and not re-introduce STW code cache processing.

Otherwise yes you are correct; we perform synchronous cross modifying code with no assumptions about instruction cache coherency because we didn't trust it would actually work for all ARM implementations. Seems like that was a good bet. We rely on it on x64 still though.

It's a bit surprising to me if they invalidate all TLB entries, effectively ripping out the entire virtual address space, even when a range is passed in. If so, a horrible alternative might be to use mprotect to temporarily remove execution permission on the affected per nmethod pages, and detect over shooting in the signal handler, resuming execution when execution privileges are then restored immediately after. That should limit the affected VA to close to what is actually invalidated. But it would look horrible.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3572695548