RFR: 8363620: AArch64: reimplement emit_static_call_stub() [v3]

Fei Gao fgao at openjdk.org
Tue Dec 2 13:54:01 UTC 2025


> In the existing implementation, the static call stub typically emits a sequence like:
> `isb; movk; movz; movz; movk; movz; movz; br`.
> 
> This patch reimplements it using a more compact and patch-friendly sequence:
> 
> ldr x12, Label_data
> ldr x8, Label_entry
> br x8
> Label_data:
>   0x00000000
>   0x00000000
> Label_entry:
>   0x00000000
>   0x00000000
> 
> The new approach places the target addresses adjacent to the code and loads them dynamically. This allows us to update the call target by modifying only the data in memory, without changing any instructions. This avoids the need for I-cache flushes or issuing an `isb`[1], which are both relatively expensive operations.
> 
> While emitting direct branches in static stubs for small code caches can save 2 instructions compared to the new implementation, modifying those branches still requires I-cache flushes or an `isb`. This patch unifies the code generation by emitting the same static stubs for both small and large code caches.
> 
> A microbenchmark (StaticCallStub.java) demonstrates a performance uplift of approximately 43%.
> 
> 
> Benchmark                       (length)   Mode   Cnt Master     Patch      Units
> StaticCallStubFar.callCompiled    1000     avgt   5   39.346     22.474     us/op
> StaticCallStubFar.callCompiled    10000    avgt   5   390.05     218.478    us/op
> StaticCallStubFar.callCompiled    100000   avgt   5   3869.264   2174.001   us/op
> StaticCallStubNear.callCompiled   1000     avgt   5   39.093     22.582     us/op
> StaticCallStubNear.callCompiled   10000    avgt   5   387.319    217.398    us/op
> StaticCallStubNear.callCompiled   100000   avgt   5   3855.825   2206.923   us/op
> 
> 
> All tests in Tier1 to Tier3, under both release and debug builds, have passed.
> 
> [1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-working-with-threads

Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:

 - Update comments and fix benchmarks
 - The patch is contributed by @theRealAph
 - Merge branch 'master' into reimplement-static-call-stub
 - Patch 'isb' to 'nop'
 - Merge branch 'master' into reimplement-static-call-stub
 - 8363620: AArch64: reimplement emit_static_call_stub()
   
   In the existing implementation, the static call stub typically
   emits a sequence like:
   `isb; movk; movz; movz; movk; movz; movz; br`.
   
   This patch reimplements it using a more compact and patch-friendly
   sequence:
   ```
   ldr x12, Label_data
   ldr x8, Label_entry
   br x8
   Label_data:
     0x00000000
     0x00000000
   Label_entry:
     0x00000000
     0x00000000
   ```
   The new approach places the target addresses adjacent to the code
   and loads them dynamically. This allows us to update the call
   target by modifying only the data in memory, without changing any
   instructions. This avoids the need for I-cache flushes or
   issuing an `isb`[1], which are both relatively expensive
   operations.
   
   While emitting direct branches in static stubs for small code
   caches can save 2 bytes compared to the new implementation,
   modifying those branches still requires I-cache flushes or an
   `isb`. This patch unifies the code generation by emitting the
   same static stubs for both small and large code caches.
   
   A microbenchmark (StaticCallStub.java) demonstrates a performance
   uplift of approximately 43%.
   
   Benchmark                       (length)   Mode   Cnt Master     Patch      Units
   StaticCallStubFar.callCompiled    1000     avgt   5   39.346     22.474     us/op
   StaticCallStubFar.callCompiled    10000    avgt   5   390.05     218.478    us/op
   StaticCallStubFar.callCompiled    100000   avgt   5   3869.264   2174.001   us/op
   StaticCallStubNear.callCompiled   1000     avgt   5   39.093     22.582     us/op
   StaticCallStubNear.callCompiled   10000    avgt   5   387.319    217.398    us/op
   StaticCallStubNear.callCompiled   100000   avgt   5   3855.825   2206.923   us/op
   
   All tests in Tier1 to Tier3, under both release and debug builds,
   have passed.
   
   [1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-working-with-threads

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26638/files
  - new: https://git.openjdk.org/jdk/pull/26638/files/f5a83e30..6d3669c1

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26638&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26638&range=01-02

  Stats: 4590 lines in 177 files changed: 3733 ins; 423 del; 434 mod
  Patch: https://git.openjdk.org/jdk/pull/26638.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26638/head:pull/26638

PR: https://git.openjdk.org/jdk/pull/26638


More information about the hotspot-dev mailing list