RFR: 8363620: AArch64: reimplement emit_static_call_stub() [v2]

Fei Gao fgao at openjdk.org
Tue Dec 9 11:09:04 UTC 2025


On Sun, 30 Nov 2025 17:08:22 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>> 
>>  - Patch 'isb' to 'nop'
>>  - Merge branch 'master' into reimplement-static-call-stub
>>  - 8363620: AArch64: reimplement emit_static_call_stub()
>>    
>>    In the existing implementation, the static call stub typically
>>    emits a sequence like:
>>    `isb; movk; movz; movz; movk; movz; movz; br`.
>>    
>>    This patch reimplements it using a more compact and patch-friendly
>>    sequence:
>>    ```
>>    ldr x12, Label_data
>>    ldr x8, Label_entry
>>    br x8
>>    Label_data:
>>      0x00000000
>>      0x00000000
>>    Label_entry:
>>      0x00000000
>>      0x00000000
>>    ```
>>    The new approach places the target addresses adjacent to the code
>>    and loads them dynamically. This allows us to update the call
>>    target by modifying only the data in memory, without changing any
>>    instructions. This avoids the need for I-cache flushes or
>>    issuing an `isb`[1], which are both relatively expensive
>>    operations.
>>    
>>    While emitting direct branches in static stubs for small code
>>    caches can save 2 bytes compared to the new implementation,
>>    modifying those branches still requires I-cache flushes or an
>>    `isb`. This patch unifies the code generation by emitting the
>>    same static stubs for both small and large code caches.
>>    
>>    A microbenchmark (StaticCallStub.java) demonstrates a performance
>>    uplift of approximately 43%.
>>    
>>    Benchmark                       (length)   Mode   Cnt Master     Patch      Units
>>    StaticCallStubFar.callCompiled    1000     avgt   5   39.346     22.474     us/op
>>    StaticCallStubFar.callCompiled    10000    avgt   5   390.05     218.478    us/op
>>    StaticCallStubFar.callCompiled    100000   avgt   5   3869.264   2174.001   us/op
>>    StaticCallStubNear.callCompiled   1000     avgt   5   39.093     22.582     us/op
>>    StaticCallStubNear.callCompiled   10000    avgt   5   387.319    217.398    us/op
>>    StaticCallStubNear.callCompiled   100000   avgt   5   3855.825   2206.923   us/op
>>    
>>    All tests in Tier1 to Tier3, under both release and debug builds,
>>    have passed.
>>    
>>    [1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-working-with-threads
>
> Your benchmark doesn't work. Please fix it.

@theRealAph Thanks for your review and for your time!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26638#issuecomment-3631651452


More information about the hotspot-dev mailing list