RFR: 8352415: x86: Tighten up template interpreter method entry code
Andrew Dinn
adinn at openjdk.org
Wed Mar 19 15:18:08 UTC 2025
On Wed, 19 Mar 2025 13:44:40 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
> Interpreter performance is the still important for faster startup, since it would carry application until compilers kick in. After looking at Leyden scenarios in Xint mode, I believe incremental improvements are possible in template interpreter to make it faster.
>
> One of those improvements is tightening up method entry code. Profiling shows the hottest path in the whole ordeal for non-native methods is resolving the Java mirror to store the GC root for currently executing Method*. It involves 4-5 chained memory accesses, which incurs significant latency.
>
> We can massage the code to reuse some memory accesses and also spread them out to allow more latency-hiding hardware mechanisms to kick in.
>
> Additional testing:
> - [x] Ad-hoc `-Xint` benchmarks
> - [ ] Linux x86_64 server fastdebug, `all`
Looks good to me.
Have you looked at aarch64 to see if this also a bottleneck there? It could use a similar trick to avoid calling load_mirror (which repeats the first two of three loads that intitialize rcpool).
-------------
Marked as reviewed by adinn (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/24114#pullrequestreview-2698928874
More information about the hotspot-dev
mailing list