RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v3]
Evgeny Astigeevich
eastigeevich at openjdk.org
Wed Dec 3 15:13:21 UTC 2025
On Tue, 25 Nov 2025 13:04:55 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> Yeah patching all nmethods as one unit is basically equivalent to making the code cache processing a STW operation. Last time we processed the code cache STW was JDK 11. A dark place I don't want to go back to. It can get pretty big and mess up latency. So I'm in favour of limiting the fix and not re-introduce STW code cache processing.
>>
>> Otherwise yes you are correct; we perform synchronous cross modifying code with no assumptions about instruction cache coherency because we didn't trust it would actually work for all ARM implementations. Seems like that was a good bet. We rely on it on x64 still though.
>>
>> It's a bit surprising to me if they invalidate all TLB entries, effectively ripping out the entire virtual address space, even when a range is passed in. If so, a horrible alternative might be to use mprotect to temporarily remove execution permission on the affected per nmethod pages, and detect over shooting in the signal handler, resuming execution when execution privileges are then restored immediately after. That should limit the affected VA to close to what is actually invalidated. But it would look horrible.
>
>> It's a bit surprising to me if they invalidate all TLB entries, effectively ripping out the entire virtual address space, even when a range is passed in. If so,
>
> "Because the cache-maintenance wasn't needed, we can do the TLBI instead.
> In fact, the I-Cache line-size isn't relevant anymore, we can reduce
> the number of traps by producing a fake value.
>
> "For user-space, the kernel's work is now to trap CTR_EL0 to hide DIC,
> and produce a fake IminLine. EL3 traps the now-necessary I-Cache
> maintenance and performs the inner-shareable-TLBI that makes everything
> better."
>
> My interpretation of this is that we only need to do the synchronization dance once, at the end of the patching. But I guess we don't know exactly if we have an affected core or if the kernel workaround is in action.
@theRealAph @fisk @shipilev
I have updated all places to use optimized icache invalidation. Could you please have a look?
I am running different tests and benchmarks.
@fisk @shipilev
- I added `nmethod::has_non_immediate_oops`. I think it's easy to detect them when we generate code. If this is OK, we might need to update `ZNMethod::attach_gc_data` and `ShenandoahNMethod::detect_reloc_oops`.
- Code of `G1NMethodClosure::do_evacuation_and_fixup(nmethod* nm)` looks strange:
_oc.set_nm(nm);
// Evacuate objects pointed to by the nmethod
nm->oops_do(&_oc);
if (_strong) {
// CodeCache unloading support
nm->mark_as_maybe_on_stack();
BarrierSetNMethod* bs_nm = BarrierSet::barrier_set()->barrier_set_nmethod();
bs_nm->disarm(nm);
}
ICacheInvalidationContext icic(nm->has_non_immediate_oops());
nm->fix_oop_relocations();
If `_strong` is true, we disarm `nm` and patch it with `fix_oop_relocations`. I have assertions checking we can defer icache invalidation. Neither of them are triggered. I thing this path always happens at a safepoint.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3607330040
More information about the shenandoah-dev
mailing list