Request for discussion: rewrite invokeinterface dispatch, JMH benchmark
Andrew Haley
aph-open at littlepinkcloud.com
Wed Oct 9 14:34:46 UTC 2024
On 10/9/24 10:18, Dmitry Chuyko wrote:
> Your observations are quite interesting. If you remember
> https://github.com/openjdk/jdk/pull/13460, example micro-benchmark
> improvements for x86 were ~10% and only ~3% in Naive Bayes.
In addition, if we compare and contrast your figures with my (rather
old) desktop machine, we see this:
Your benchmark results, from the PR, before and after in two columns, ns/op:
CPU: AMD EPYC 7502P (2019)
InterfaceCalls.test1stInt2Types 5.157 5.135 0.43%
InterfaceCalls.test1stInt3Types 9.882 9.807 0.76%
InterfaceCalls.test1stInt5Types 9.864 9.802 0.63%
InterfaceCalls.test2ndInt2Types 6.664 5.432 18.49%
InterfaceCalls.test2ndInt3Types 10.411 10.046 3.51%
InterfaceCalls.test2ndInt5Types 10.49 10.075 3.96%
InterfaceCalls.testIfaceCall 46.789 46.72 0.15%
InterfaceCalls.testIfaceExtCall 50.724 46.55 8.23%
InterfaceCalls.testMonomorphic 4.823 4.826 0.06%
My results, today, JDK head, AMD Ryzen Threadripper 2950X (2018) is much more
like the Apple M1:
InterfaceCalls.test1stInt2Types 2.172
InterfaceCalls.test1stInt3Types 5.721
InterfaceCalls.test1stInt5Types 6.468
InterfaceCalls.test2ndInt2Types 2.202
InterfaceCalls.test2ndInt3Types 5.981
InterfaceCalls.test2ndInt5Types 5.992
InterfaceCalls.testIfaceCall 5.722
InterfaceCalls.testIfaceExtCall 5.947
InterfaceCalls.testMonomorphic 0.990
I think the 2950X has a faster clock, but the dramatic thing is that
testIface* are the same speed as all the other "real" interface calls,
at 6ns. I guess 2950X also has better branch prediction.
Let's try to test that guess on 2950X:
Regular Scrambled
InterfaceCalls.test2ndInt5Types 5.985 20.853
InterfaceCalls.test2ndInt5TypesScrambled
:-)
Looking at some more detailed stats, the scrambled version does twice as many
memory loads and has a missed branch on each iteration, suggesting that the
CPU always speculates an entire iteration, gets it wrong, then has to do it
all again.
Regular Scrambled
InterfaceCalls.test2ndInt5Types:L1-dcache-loads:u 31.975 60.393
InterfaceCalls.test2ndInt5Types:branch-misses:u ≈ 10⁻⁴ 1.263
--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the hotspot-dev
mailing list