RFR: 8363620: AArch64: reimplement emit_static_call_stub() [v2]
Fei Gao
fgao at openjdk.org
Fri Nov 28 10:17:50 UTC 2025
> In the existing implementation, the static call stub typically emits a sequence like:
> `isb; movk; movz; movz; movk; movz; movz; br`.
>
> This patch reimplements it using a more compact and patch-friendly sequence:
>
> ldr x12, Label_data
> ldr x8, Label_entry
> br x8
> Label_data:
> 0x00000000
> 0x00000000
> Label_entry:
> 0x00000000
> 0x00000000
>
> The new approach places the target addresses adjacent to the code and loads them dynamically. This allows us to update the call target by modifying only the data in memory, without changing any instructions. This avoids the need for I-cache flushes or issuing an `isb`[1], which are both relatively expensive operations.
>
> While emitting direct branches in static stubs for small code caches can save 2 instructions compared to the new implementation, modifying those branches still requires I-cache flushes or an `isb`. This patch unifies the code generation by emitting the same static stubs for both small and large code caches.
>
> A microbenchmark (StaticCallStub.java) demonstrates a performance uplift of approximately 43%.
>
>
> Benchmark (length) Mode Cnt Master Patch Units
> StaticCallStubFar.callCompiled 1000 avgt 5 39.346 22.474 us/op
> StaticCallStubFar.callCompiled 10000 avgt 5 390.05 218.478 us/op
> StaticCallStubFar.callCompiled 100000 avgt 5 3869.264 2174.001 us/op
> StaticCallStubNear.callCompiled 1000 avgt 5 39.093 22.582 us/op
> StaticCallStubNear.callCompiled 10000 avgt 5 387.319 217.398 us/op
> StaticCallStubNear.callCompiled 100000 avgt 5 3855.825 2206.923 us/op
>
>
> All tests in Tier1 to Tier3, under both release and debug builds, have passed.
>
> [1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-working-with-threads
Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
- Patch 'isb' to 'nop'
- Merge branch 'master' into reimplement-static-call-stub
- 8363620: AArch64: reimplement emit_static_call_stub()
In the existing implementation, the static call stub typically
emits a sequence like:
`isb; movk; movz; movz; movk; movz; movz; br`.
This patch reimplements it using a more compact and patch-friendly
sequence:
```
ldr x12, Label_data
ldr x8, Label_entry
br x8
Label_data:
0x00000000
0x00000000
Label_entry:
0x00000000
0x00000000
```
The new approach places the target addresses adjacent to the code
and loads them dynamically. This allows us to update the call
target by modifying only the data in memory, without changing any
instructions. This avoids the need for I-cache flushes or
issuing an `isb`[1], which are both relatively expensive
operations.
While emitting direct branches in static stubs for small code
caches can save 2 bytes compared to the new implementation,
modifying those branches still requires I-cache flushes or an
`isb`. This patch unifies the code generation by emitting the
same static stubs for both small and large code caches.
A microbenchmark (StaticCallStub.java) demonstrates a performance
uplift of approximately 43%.
Benchmark (length) Mode Cnt Master Patch Units
StaticCallStubFar.callCompiled 1000 avgt 5 39.346 22.474 us/op
StaticCallStubFar.callCompiled 10000 avgt 5 390.05 218.478 us/op
StaticCallStubFar.callCompiled 100000 avgt 5 3869.264 2174.001 us/op
StaticCallStubNear.callCompiled 1000 avgt 5 39.093 22.582 us/op
StaticCallStubNear.callCompiled 10000 avgt 5 387.319 217.398 us/op
StaticCallStubNear.callCompiled 100000 avgt 5 3855.825 2206.923 us/op
All tests in Tier1 to Tier3, under both release and debug builds,
have passed.
[1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-working-with-threads
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/26638/files
- new: https://git.openjdk.org/jdk/pull/26638/files/5f9285ca..f5a83e30
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=26638&range=01
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=26638&range=00-01
Stats: 610902 lines in 6782 files changed: 419425 ins; 121578 del; 69899 mod
Patch: https://git.openjdk.org/jdk/pull/26638.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/26638/head:pull/26638
PR: https://git.openjdk.org/jdk/pull/26638
More information about the hotspot-dev
mailing list