RFR: 8363620: AArch64: reimplement emit_static_call_stub() [v2]
Fei Gao
fgao at openjdk.org
Tue Dec 9 11:09:04 UTC 2025
On Sun, 30 Nov 2025 17:08:22 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>>
>> - Patch 'isb' to 'nop'
>> - Merge branch 'master' into reimplement-static-call-stub
>> - 8363620: AArch64: reimplement emit_static_call_stub()
>>
>> In the existing implementation, the static call stub typically
>> emits a sequence like:
>> `isb; movk; movz; movz; movk; movz; movz; br`.
>>
>> This patch reimplements it using a more compact and patch-friendly
>> sequence:
>> ```
>> ldr x12, Label_data
>> ldr x8, Label_entry
>> br x8
>> Label_data:
>> 0x00000000
>> 0x00000000
>> Label_entry:
>> 0x00000000
>> 0x00000000
>> ```
>> The new approach places the target addresses adjacent to the code
>> and loads them dynamically. This allows us to update the call
>> target by modifying only the data in memory, without changing any
>> instructions. This avoids the need for I-cache flushes or
>> issuing an `isb`[1], which are both relatively expensive
>> operations.
>>
>> While emitting direct branches in static stubs for small code
>> caches can save 2 bytes compared to the new implementation,
>> modifying those branches still requires I-cache flushes or an
>> `isb`. This patch unifies the code generation by emitting the
>> same static stubs for both small and large code caches.
>>
>> A microbenchmark (StaticCallStub.java) demonstrates a performance
>> uplift of approximately 43%.
>>
>> Benchmark (length) Mode Cnt Master Patch Units
>> StaticCallStubFar.callCompiled 1000 avgt 5 39.346 22.474 us/op
>> StaticCallStubFar.callCompiled 10000 avgt 5 390.05 218.478 us/op
>> StaticCallStubFar.callCompiled 100000 avgt 5 3869.264 2174.001 us/op
>> StaticCallStubNear.callCompiled 1000 avgt 5 39.093 22.582 us/op
>> StaticCallStubNear.callCompiled 10000 avgt 5 387.319 217.398 us/op
>> StaticCallStubNear.callCompiled 100000 avgt 5 3855.825 2206.923 us/op
>>
>> All tests in Tier1 to Tier3, under both release and debug builds,
>> have passed.
>>
>> [1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-working-with-threads
>
> Your benchmark doesn't work. Please fix it.
@theRealAph Thanks for your review and for your time!
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26638#issuecomment-3631651452
More information about the hotspot-dev
mailing list