RFR: 8305959: Improve itable_stub
Boris Ulasevich
bulasevich at openjdk.org
Fri May 5 18:16:20 UTC 2023
On Thu, 13 Apr 2023 14:33:52 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:
> Async profiler shows that applications spend up to 10% in itable_stubs.
>
> The current inefficiency of itable stubs is as follows. The generated itable_stub scans itable twice: first it checks if the object class is a subtype of the resolved_class, and then it finds the holder_class that implements the method. I suggest doing this in one pass: with a first loop over itable, check pointer equality to both holder_class and resolved_class. Once we have finished searching for resolved_class, continue searching for holder_class in a separate loop if it has not yet been found.
>
> This approach gives 1-10% improvement on the synthetic benchmarks and 3% improvement on Naive Bayes benchmark from the Renaissance Benchmark Suite (Intel Xeon X5675).
Hi Andrew. Thank you.
The goal of this PR is to refactor repetitive code which can spend a significant amount of time scanning itables. I started looking into this because some applications spend a decent amount of time in this code.
The itable assembly stubs contain repetitive code - the current algorithm gets offsets and iterates over the itable data twice. I propose to do both lookups in a single pass over the interface table: once we have retrieved the interface klass pointer, we can perform both checks on it.
So the new algorithm consists of two loops. First, we look for a match to resolved_klass, checking for a match to holder_klass along the way. Then we continue iterating over itable using the second loop, checking for a match only with holder_klass.
This way we can almost double the performance of the itable lookup.
Here are some numbers on the OpenJDK micro-benchmarks that were also enhanced as part of this PR (ns/ops before|ns/ops after|difference).
CPU: Intel Xeon Platinum 8268
InterfaceCalls.test1stInt2Types 3.049 3.051 -0.07%
InterfaceCalls.test1stInt3Types 7.287 6.782 6.93%
InterfaceCalls.test1stInt5Types 7.324 6.596 9.94%
InterfaceCalls.test2ndInt2Types 3.542 3.456 2.43%
InterfaceCalls.test2ndInt3Types 8.234 7.376 10.42%
InterfaceCalls.test2ndInt5Types 8.349 7.425 11.07%
InterfaceCalls.testIfaceCall 35.035 29.413 16.05%
InterfaceCalls.testIfaceExtCall 40.061 32.32 19.31%
InterfaceCalls.testMonomorphic 2.644 2.652 -0.30%
geomean 8.081 7.382 8.65%
CPU: AMD EPYC 7502P
InterfaceCalls.test1stInt2Types 5.157 5.135 0.43%
InterfaceCalls.test1stInt3Types 9.882 9.807 0.76%
InterfaceCalls.test1stInt5Types 9.864 9.802 0.63%
InterfaceCalls.test2ndInt2Types 6.664 5.432 18.49%
InterfaceCalls.test2ndInt3Types 10.411 10.046 3.51%
InterfaceCalls.test2ndInt5Types 10.49 10.075 3.96%
InterfaceCalls.testIfaceCall 46.789 46.72 0.15%
InterfaceCalls.testIfaceExtCall 50.724 46.55 8.23%
InterfaceCalls.testMonomorphic 4.823 4.826 0.06%
geomean 11.724 11.233 4.19%
CPU: i7-1160G7
InterfaceCalls.test1stInt2Types 2.822 2.748 2.62%
InterfaceCalls.test1stInt3Types 5.701 5.309 6.88%
InterfaceCalls.test1stInt5Types 5.741 5.349 6.83%
InterfaceCalls.test2ndInt2Types 2.892 2.898 -0.21%
InterfaceCalls.test2ndInt3Types 6.666 5.858 12.12%
InterfaceCalls.test2ndInt5Types 6.686 5.851 12.49%
InterfaceCalls.testIfaceCall 26.992 24.302 9.97%
InterfaceCalls.testIfaceExtCall 33.12 27.053 18.32%
InterfaceCalls.testMonomorphic 2.415 2.455 -1.66%
geomean 6.657 6.145 7.69%
CPU: i5-3320M
InterfaceCalls.test1stInt2Types 11.551 11.291 2.25%
InterfaceCalls.test1stInt3Types 65.911 34.574 47.54%
InterfaceCalls.test1stInt5Types 65.78 40.923 37.79%
InterfaceCalls.test2ndInt2Types 14.088 13.431 4.66%
InterfaceCalls.test2ndInt3Types 41.186 37.223 9.62%
InterfaceCalls.test2ndInt5Types 47.237 42.74 9.52%
InterfaceCalls.testIfaceCall 285.568 163.311 42.81%
InterfaceCalls.testIfaceExtCall 304.335 284.027 6.67%
InterfaceCalls.testMonomorphic 10.074 9.673 3.98%
geomean 47.373 37.681 20.46%
-------------
PR Comment: https://git.openjdk.org/jdk/pull/13460#issuecomment-1536607523
More information about the hotspot-dev
mailing list