RFR: 8363620: AArch64: reimplement emit_static_call_stub()

Wed Aug 6 10:36:04 UTC 2025

On Wed, 6 Aug 2025 09:30:07 GMT, Andrew Haley <aph at openjdk.org> wrote:

>>> Unaligned memory access may be non-atomic, although in this case, other threads aren’t modifying the data.
>> 
>> I thought other threads are modifying the data, which is the reason why we needed the ISB.
>
>> > Unaligned memory access may be non-atomic, although in this case, other threads aren’t modifying the data.
>> 
>> I thought other threads are modifying the data, which is the reason why we needed the ISB.
> 
> Yes, exactly. Other threads are executing this nmethod while we are modifying its static call stub.
> 
> We hold the lock while we're doing this, so any thread racing with us to _modify_ the call stub would be blocked. However, another thread could execute the call we just patched but not fully observe the changes we made to the stub, which is why we need the ISB.
> 
> We have a similar situation with this PR, but with data rather than instruction memory. We need to make sure that any racing thread observes changes to the stub before it observes the patched call to the stub. There is no ordering between instruction and data memory, which is why the current stub uses only instruction memory, keeping everything right. The ISB ensures that the executing thread observes the changed stub: there is a happens-before from the `ICache::invalidate_range` after patching the stub (but before patching the call) to the `isb` at the start of the stub.
> 
> For this PR to be correctly synchronized, we'd need a similar happens-before from patching the constants used in the stub to executing the stub. I can think of a way to do that, but it's fiddly and may not be worth it.

Yes, it's a very subtle point that the isb serves to isolate threads executing the call from partial updates by of the writing thread i.e. make the change to the call site atomic. It might be useful to have that written in some comment in the patching code rather than just documented in this issue (but it helps to have it said here).

I understand that an isb is relatively expensive and I don't doubt the figures provided from running the micro-benchmark. However, they are not necessarily any indication that there is a problem here. How many such call sites are there and how often do they get patched? Is that significant relative relative to, say, the number of times they are executed or some other measure of overall cost? Is there evidence that we really need to fix this?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26638#discussion_r2256731349