RFR: 8363620: AArch64: reimplement emit_static_call_stub()
Fei Gao
fgao at openjdk.org
Mon Oct 27 14:19:43 UTC 2025
On Mon, 27 Oct 2025 12:07:19 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> In the existing implementation, the static call stub typically emits a sequence like:
>> `isb; movk; movz; movz; movk; movz; movz; br`.
>>
>> This patch reimplements it using a more compact and patch-friendly sequence:
>>
>> ldr x12, Label_data
>> ldr x8, Label_entry
>> br x8
>> Label_data:
>> 0x00000000
>> 0x00000000
>> Label_entry:
>> 0x00000000
>> 0x00000000
>>
>> The new approach places the target addresses adjacent to the code and loads them dynamically. This allows us to update the call target by modifying only the data in memory, without changing any instructions. This avoids the need for I-cache flushes or issuing an `isb`[1], which are both relatively expensive operations.
>>
>> While emitting direct branches in static stubs for small code caches can save 2 instructions compared to the new implementation, modifying those branches still requires I-cache flushes or an `isb`. This patch unifies the code generation by emitting the same static stubs for both small and large code caches.
>>
>> A microbenchmark (StaticCallStub.java) demonstrates a performance uplift of approximately 43%.
>>
>>
>> Benchmark (length) Mode Cnt Master Patch Units
>> StaticCallStubFar.callCompiled 1000 avgt 5 39.346 22.474 us/op
>> StaticCallStubFar.callCompiled 10000 avgt 5 390.05 218.478 us/op
>> StaticCallStubFar.callCompiled 100000 avgt 5 3869.264 2174.001 us/op
>> StaticCallStubNear.callCompiled 1000 avgt 5 39.093 22.582 us/op
>> StaticCallStubNear.callCompiled 10000 avgt 5 387.319 217.398 us/op
>> StaticCallStubNear.callCompiled 100000 avgt 5 3855.825 2206.923 us/op
>>
>>
>> All tests in Tier1 to Tier3, under both release and debug builds, have passed.
>>
>> [1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-working-with-threads
>
> OK, this one looks way more practical than the previous version of this PR. As long as we still use only instruction fetches in the stub, it ought to work.
>
> The correctness of this is rather subtle, however, and the change to `set_destination_mt_safe` is downright obscure.
>
> It perhaps would be clearer to leave the first instruction in the call stub as an ISB. When we patched the call stub, we'd change it to branch+1. That might work as well, but be simpler and much less obscure.
@theRealAph Thanks so much for your quick feedback!
> > Then the callee_address in trampoline_stub would no longer be expected to point to static_stub.
>
> This is to ensure that the patched code is only reachable via a direct jump, I guess?
Yes, that’s exactly what I mean.
> OK, this one looks way more practical than the previous version of this PR. As long as we still use only instruction fetches in the stub, it ought to work.
>
> The correctness of this is rather subtle, however, and the change to set_destination_mt_safe is downright obscure.
>
> It perhaps would be clearer to leave the first instruction in the call stub as an ISB. When we patched the call stub, we'd change it to branch+1. That might work as well, but be simpler and much less obscure.
Let me confirm if I’ve understood you correctly. Initially, the static call stub looks like this:
[main code]
L0:
bl trampoline_stub
... {post call}
static_stub:
isb
mov x12, #0x0
movk x12, #0x0, lsl #16
movk x12, #0x0, lsl #32
mov x8, #0x0
movk x8, #0x0, lsl #16
movk x8, #0x0, lsl #32
br x8
After the stub is patched, it becomes:
[main code]
L0:
bl static_stub
... {post call}
static_stub:
b entry_point
entry_point:
mov x12, #0x0
movk x12, #0x0, lsl #16
movk x12, #0x0, lsl #32
mov x8, #0x0
movk x8, #0x0, lsl #16
movk x8, #0x0, lsl #32
br x8
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26638#issuecomment-3451522233
More information about the hotspot-dev
mailing list