RFR: 8352415: x86: Tighten up template interpreter method entry code
Aleksey Shipilev
shade at openjdk.org
Wed Mar 19 13:50:46 UTC 2025
On Wed, 19 Mar 2025 13:44:40 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
> Interpreter performance is the still important for faster startup, since it would carry application until compilers kick in. After looking at Leyden scenarios in Xint mode, I believe incremental improvements are possible in template interpreter to make it faster.
>
> One of those improvements is tightening up method entry code. Profiling shows the hottest path in the whole ordeal for non-native methods is resolving the Java mirror to store the GC root for currently executing Method*. It involves 4-5 chained memory accesses, which incurs significant latency.
>
> We can massage the code to reuse some memory accesses and also spread them out to allow more latency-hiding hardware mechanisms to kick in.
>
> Additional testing:
> - [x] Ad-hoc `-Xint` benchmarks
> - [ ] Linux x86_64 server fastdebug, `all`
Motivational improvements on 5950X, about 1.8% faster interpreted code.
Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -Xms1g -Xmx1g -XX:+UseSerialGC \
-Xint -cp JavacBenchApp.jar JavacBenchApp 1
# Before
Time (mean ± σ): 1.533 s ± 0.013 s [User: 1.479 s, System: 0.051 s]
Range (min … max): 1.517 s … 1.551 s 10 runs
# After
Time (mean ± σ): 1.506 s ± 0.012 s [User: 1.451 s, System: 0.051 s]
Range (min … max): 1.493 s … 1.528 s 10 runs
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24114#issuecomment-2736705387
More information about the hotspot-dev
mailing list