RFR: 8363620: AArch64: reimplement emit_static_call_stub() [v6]

Fei Gao fgao at openjdk.org
Tue Dec 23 16:54:00 UTC 2025


On Mon, 22 Dec 2025 23:29:39 GMT, Dean Long <dlong at openjdk.org> wrote:

> I'm still unsure about the optimization of rewriting ISB to B. Doesn't that mean the ISB is not really needed? How is an entry point with a branch that used to be a ISB different from a branch that was always there, or no branch at all? 

Thanks for asking.

Assume Thread 1 is executing the resolve stub while other threads may be executing concurrently. The sequence of actions Thread 1 performs is:

1. Patch the MOV instructions in the static call stub
2. Invalidate the instruction cache for the patched MOVs
3. Patch the trampoline
4. Patch the call site

The key point is that the trampoline uses an **indirect** branch, whereas the call site uses a **direct** branch, and these have very different visibility guarantees in the Arm memory model.

For a **direct** branch, the sequence
patch MOVs → invalidate patched code → patch direct branch
is sufficient. No ISB is required. Because the Arm memory model guarantees that if Thread 2 executes the call site and observes that it has been patched to point to the static call stub, it must observe the fully updated MOVs in the stub.

However, no such guarantee exists for an **indirect** branch, i.e. the trampoline. If Thread 3 reaches the static call stub via the trampoline, observing the updated trampoline target alone does not guarantee that it will observe the fully updated MOVs. How can we guarantee that?

An always-present ISB at the entry of the static call stub provides this guarantee, which is the existing implementation.

Is there another way to achieve the same effect? Yes: rewriting the ISB into a direct branch (B .+4) can also establish the required ordering. The sequence
patch MOVs → invalidate patched code → patch ISB to B .+4
creates a synchronized order similar to patching a **direct** call site. If Thread 3 observes that the original ISB has been rewritten to B .+4, it must also observe the fully updated MOVs.

In contrast, if the first instruction in the stub were always B .+4 from the beginning, there would be no observable change at that instruction, and therefore no guarantee that Thread 3 would observe the updated MOVs.

> For example, we could just omit it or have the caller jump to entry point + 4.

Yes, that is possible for Thread 2. However, we still need to address the issues faced by Thread 3.

If we don't patch trampoline while resolving static stub, ensuring that the trampoline never points to the static call stub, then the “Thread 3” scenario disappears, and we would not need an ISB at all.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26638#issuecomment-3687310295


More information about the hotspot-dev mailing list