RFR: 8269725: AArch64: Add VectorMask query implementation for NEON [v6]
Andrew Haley
aph at openjdk.java.net
Thu Jul 15 15:01:23 UTC 2021
On Thu, 15 Jul 2021 07:56:25 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
>> The VectorMask query (`trueCount, firstTrue, lastTrue`) APIs can be intrinsified after [1] is closed. This patch adds the Arm NEON backend implementation for the new added vector nodes.
>>
>> Here is the performance comparison data for the three APIs with and without this patch:
>>
>> Benchmark (bits) (inputs) Before After Gain Units
>> MaskQueryOperationsBenchmark.testFirstTrueByte 128 1 42583.141 103900.253 2.44 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueByte 128 2 37158.470 108234.110 2.91 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueByte 128 3 42583.584 108235.231 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt 128 1 42583.625 108236.859 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt 128 2 42583.288 107368.205 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt 128 3 42583.673 108232.371 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong 128 1 42583.408 108232.617 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong 128 2 42583.443 107367.035 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong 128 3 42583.111 108236.036 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort 128 1 42583.536 108230.365 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort 128 2 41231.639 108239.148 2.62 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort 128 3 42583.630 108238.542 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte 128 1 42584.067 108238.989 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte 128 2 36845.596 108234.297 2.94 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte 128 3 42583.759 108237.501 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt 128 1 42583.319 108236.218 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt 128 2 42583.112 108234.516 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt 128 3 42583.340 108238.777 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong 128 1 42581.004 108233.701 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong 128 2 42583.266 108238.323 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong 128 3 42583.542 108234.327 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort 128 1 42583.552 108238.011 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort 128 2 41231.142 108237.919 2.63 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort 128 3 44784.270 108238.011 2.42 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte 128 1 37075.556 108233.571 2.92 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte 128 2 37527.370 108233.396 2.88 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte 128 3 36585.788 107372.032 2.93 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt 128 1 42583.608 108233.721 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt 128 2 42584.733 107369.578 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt 128 3 42583.623 107367.859 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong 128 1 42583.671 107368.004 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong 128 2 42583.661 108233.301 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong 128 3 42583.015 108232.783 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort 128 1 41229.280 108233.369 2.63 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort 128 2 41231.914 107366.904 2.60 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort 128 3 41231.734 108233.606 2.63 ops/ms
>>
>> All VectorAPI jtreg tests pass with patch [2] is applied together.
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8256973
>> [2] https://github.com/openjdk/jdk17/pull/168
>>
>> Tested tier1 and jdk:tier3.
>
> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits:
>
> - Merge firstrue 8B/LT8B codes into a macro processor
> - Merge branch jdk:master into JDK-8269725
> - Merge jdk:master into JDK-8269725
> - Add more comments
> - Remove the begining "negr" for "firsttrue,lasttrue"
> - Merge branch 'jdk:master' into JDK-8269725
> - 8269725: AArch64: Add VectorMask query implementation for NEON
OK, thanks.
-------------
Marked as reviewed by aph (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/4699
More information about the hotspot-compiler-dev
mailing list