RFR: 8307352: AARCH64: Improve itable_stub
Andrew Haley
aph at openjdk.org
Tue Jun 13 14:36:02 UTC 2023
On Thu, 4 May 2023 07:36:43 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:
> This is a change for AARCH similar to https://github.com/openjdk/jdk/pull/13460
>
> The change replaces two separate iterations over the itable with a new algorithm consisting of two loops. First, we look for a match with resolved_klass, checking for a match with holder_klass along the way. Then we continue iterating (not starting over) the itable using the second loop, checking only for a match with holder_klass.
>
> InterfaceCalls openjdk benchmark performance results on A53, A72, Neoverse N1 and V1 micro-architectures:
>
>
> Cortex-A53 (Pi 3 Model B Rev 1.2)
>
> test1stInt2Types 37.5 37.358 0.38
> test1stInt3Types 160.166 148.04 8.19
> test1stInt5Types 158.131 147.955 6.88
> test2ndInt2Types 52.634 53.291 -1.23
> test2ndInt3Types 201.39 181.603 10.90
> test2ndInt5Types 195.722 176.707 10.76
> testIfaceCall 157.453 140.498 12.07
> testIfaceExtCall 175.46 154.351 13.68
> testMonomorphic 32.052 32.039 0.04
> AVG: 6.85
>
> Cortex-A72 (Pi 4 Model B Rev 1.2)
>
> test1stInt2Types 27.4796 27.4738 0.02
> test1stInt3Types 66.0085 64.9374 1.65
> test1stInt5Types 67.9812 66.2316 2.64
> test2ndInt2Types 32.0581 32.062 -0.01
> test2ndInt3Types 68.2715 65.6643 3.97
> test2ndInt5Types 68.1012 65.8024 3.49
> testIfaceCall 64.0684 64.1811 -0.18
> testIfaceExtCall 91.6226 81.5867 12.30
> testMonomorphic 26.7161 26.7142 0.01
> AVG: 2.66
>
> Neoverse N1 (m6g.metal)
>
> test1stInt2Types 2.9104 2.9086 0.06
> test1stInt3Types 10.9642 10.2909 6.54
> test1stInt5Types 10.9607 10.2856 6.56
> test2ndInt2Types 3.3410 3.3478 -0.20
> test2ndInt3Types 12.3291 11.3089 9.02
> test2ndInt5Types 12.328 11.2704 9.38
> testIfaceCall 11.0598 10.3657 6.70
> testIfaceExtCall 13.0692 11.2826 15.84
> testMonomorphic 2.2354 2.2341 0.06
> AVG: 6.00
>
> Neoverse V1 (c7g.2xlarge)
>
> test1stInt2Types 2.2317 2.2320 -0.01
> test1stInt3Types 6.6884 6.1911 8.03
> test1stInt5Types 6.7334 6.2193 8.27
> test2ndInt2Types 2.4002 2.4013 -0.04
> test2ndInt3Types 7.9603 7.0372 13.12
> test2ndInt5Types 7.9532 7.0474 12.85
> testIfaceCall 6.7028 6.3272 5.94
> testIfaceExtCall 8.3253 6.9416 19.93
> testMonomorphic 1.2446 1.2544 -0.79
> AVG: 7.48
>
>
> Testing...
Thanks.
For this to be reviewable, we'll need:
A benchmark, and some data.
An explanation of why it's better than the existing implementation.
src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1255:
> 1253: // }
> 1254: // } while (temp_itbl_klass != 0);
> 1255: // goto L_no_such_interface // Not found.
I don't think the pseudocode matches the assembly. This is more like
do {
temp_itbl_klass = *(scan_temp += scan_step);
if (holder_klass, temp_itbl_klass ) ...
src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1283:
> 1281: bind(L_loop_scan_resolved_entry);
> 1282: cmp(holder_klass, temp_itbl_klass);
> 1283: csel(holder_offset, scan_temp, holder_offset, Assembler::EQ);
It's worth being cautious about conditional selects. In out-of-order machines they help when the outcome genuinely is unpredictable and the probability of each is close to 50-50. Benchmarks can fool us because they are often designed to be random, but real-world code rarely is.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/13792#issuecomment-1534360182
PR Review Comment: https://git.openjdk.org/jdk/pull/13792#discussion_r1221626714
PR Review Comment: https://git.openjdk.org/jdk/pull/13792#discussion_r1222732706
More information about the hotspot-dev
mailing list