RFR: 8363620: AArch64: reimplement emit_static_call_stub() [v3]
Andrew Haley
aph at openjdk.org
Thu Dec 4 11:57:09 UTC 2025
On Thu, 4 Dec 2025 11:48:50 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:
>>
>> - Update comments and fix benchmarks
>> - The patch is contributed by @theRealAph
>> - Merge branch 'master' into reimplement-static-call-stub
>> - Patch 'isb' to 'nop'
>> - Merge branch 'master' into reimplement-static-call-stub
>> - 8363620: AArch64: reimplement emit_static_call_stub()
>>
>> In the existing implementation, the static call stub typically
>> emits a sequence like:
>> `isb; movk; movz; movz; movk; movz; movz; br`.
>>
>> This patch reimplements it using a more compact and patch-friendly
>> sequence:
>> ```
>> ldr x12, Label_data
>> ldr x8, Label_entry
>> br x8
>> Label_data:
>> 0x00000000
>> 0x00000000
>> Label_entry:
>> 0x00000000
>> 0x00000000
>> ```
>> The new approach places the target addresses adjacent to the code
>> and loads them dynamically. This allows us to update the call
>> target by modifying only the data in memory, without changing any
>> instructions. This avoids the need for I-cache flushes or
>> issuing an `isb`[1], which are both relatively expensive
>> operations.
>>
>> While emitting direct branches in static stubs for small code
>> caches can save 2 bytes compared to the new implementation,
>> modifying those branches still requires I-cache flushes or an
>> `isb`. This patch unifies the code generation by emitting the
>> same static stubs for both small and large code caches.
>>
>> A microbenchmark (StaticCallStub.java) demonstrates a performance
>> uplift of approximately 43%.
>>
>> Benchmark (length) Mode Cnt Master Patch Units
>> StaticCallStubFar.callCompiled 1000 avgt 5 39.346 22.474 us/op
>> StaticCallStubFar.callCompiled 10000 avgt 5 390.05 218.478 us/op
>> StaticCallStubFar.callCompiled 100000 avgt 5 3869.264 2174.001 us/op
>> StaticCallStubNear.callCompiled 1000 avgt 5 39.093 22.582 us/op
>> StaticCallStubNear.callCompiled 10000 avgt 5 387.319 217.398 us/op
>> StaticCallStubNear.callCompiled 100000 avgt 5 3855.825 2206.923 us/op
>>
>> All tests in Tier1 to Tier3, under both release and debug builds,
>> have passed.
>>
>> [1] ht...
>
> src/hotspot/cpu/aarch64/compiledIC_aarch64.cpp line 180:
>
>> 178: nativeCallTrampolineStub_at(trampoline_stub_addr)->set_destination(stub);
>> 179: }
>> 180:
>
> Suggestion:
>
>
> // This code is executed while other threads are running. We must
> // ensure that at all times there is a valid path of execution. A
> // racing thread either observes a call (possibly via a trampoline)
> // to SharedRuntime::resolve_static_call_C or a complete call to the
> // interpreter.
> //
> // If a racing thread observes an updated direct branch at a call
> // site, it must also observe all of the updated instructions in the
> // static interpreter stub.
> //
> // To ensure this, we first update the static interpreter stub, then
> // the trampoline, then the direct branch at the call site.
> // ...
Here's my thinking:
First say what you want to achieve and why.
Be definite about what you say in comments. Avoid words like "generally", for example. Always use the strongest words you can. For example, "because" is stronger than "since".
Be concise. The purpose of all code is to ensure correctness, so you don't have to say so.
This comment is not perfect, so feel free to amend anything that is unclear.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/26638#discussion_r2588762916
More information about the hotspot-dev
mailing list