RFR: 8269725: AArch64: Add VectorMask query implementation for NEON [v3]
Andrew Haley
aph at openjdk.java.net
Fri Jul 9 08:28:54 UTC 2021
On Fri, 9 Jul 2021 06:44:14 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
>> The VectorMask query (`trueCount, firstTrue, lastTrue`) APIs can be intrinsified after [1] is closed. This patch adds the Arm NEON backend implementation for the new added vector nodes.
>>
>> Here is the performance comparison data for the three APIs with and without this patch:
>>
>> Benchmark (bits) (inputs) Before After Gain Units
>> MaskQueryOperationsBenchmark.testFirstTrueByte 128 1 42583.141 103900.253 2.44 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueByte 128 2 37158.470 108234.110 2.91 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueByte 128 3 42583.584 108235.231 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt 128 1 42583.625 108236.859 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt 128 2 42583.288 107368.205 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt 128 3 42583.673 108232.371 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong 128 1 42583.408 108232.617 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong 128 2 42583.443 107367.035 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong 128 3 42583.111 108236.036 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort 128 1 42583.536 108230.365 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort 128 2 41231.639 108239.148 2.62 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort 128 3 42583.630 108238.542 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte 128 1 42584.067 108238.989 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte 128 2 36845.596 108234.297 2.94 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte 128 3 42583.759 108237.501 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt 128 1 42583.319 108236.218 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt 128 2 42583.112 108234.516 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt 128 3 42583.340 108238.777 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong 128 1 42581.004 108233.701 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong 128 2 42583.266 108238.323 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong 128 3 42583.542 108234.327 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort 128 1 42583.552 108238.011 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort 128 2 41231.142 108237.919 2.63 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort 128 3 44784.270 108238.011 2.42 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte 128 1 37075.556 108233.571 2.92 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte 128 2 37527.370 108233.396 2.88 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte 128 3 36585.788 107372.032 2.93 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt 128 1 42583.608 108233.721 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt 128 2 42584.733 107369.578 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt 128 3 42583.623 107367.859 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong 128 1 42583.671 107368.004 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong 128 2 42583.661 108233.301 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong 128 3 42583.015 108232.783 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort 128 1 41229.280 108233.369 2.63 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort 128 2 41231.914 107366.904 2.60 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort 128 3 41231.734 108233.606 2.63 ops/ms
>>
>> All VectorAPI jtreg tests pass with patch [2] is applied together.
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8256973
>> [2] https://github.com/openjdk/jdk17/pull/168
>>
>> Tested tier1 and jdk:tier3.
>
> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
>
> Remove the begining "negr" for "firsttrue,lasttrue"
src/hotspot/cpu/aarch64/aarch64_neon.ad line 5340:
> 5338: format %{ "vmask_firsttrue $dst, $src\t# vector (4I/4S/2I)" %}
> 5339: ins_encode %{
> 5340: // Input "src" is a vector of boolean with "0/1" as the element values.
Say "vector of bytes." I read this as a vector of bits.
src/hotspot/cpu/aarch64/aarch64_neon.ad line 5381:
> 5379: Label FIRST_TRUE_INDEX;
> 5380:
> 5381: // Move the lower 64-bits to a general register and check whether the
"64 bits". No hyphen.
-------------
PR: https://git.openjdk.java.net/jdk/pull/4699
More information about the hotspot-compiler-dev
mailing list