RFR: 8269725: AArch64: Add VectorMask query implementation for NEON [v2]
Andrew Haley
aph at openjdk.java.net
Thu Jul 8 09:30:51 UTC 2021
On Thu, 8 Jul 2021 07:27:59 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
>> The VectorMask query (`trueCount, firstTrue, lastTrue`) APIs can be intrinsified after [1] is closed. This patch adds the Arm NEON backend implementation for the new added vector nodes.
>>
>> Here is the performance comparison data for the three APIs with and without this patch:
>>
>> Benchmark (bits) (inputs) Before After Gain Units
>> MaskQueryOperationsBenchmark.testFirstTrueByte 128 1 42583.141 103900.253 2.44 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueByte 128 2 37158.470 108234.110 2.91 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueByte 128 3 42583.584 108235.231 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt 128 1 42583.625 108236.859 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt 128 2 42583.288 107368.205 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt 128 3 42583.673 108232.371 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong 128 1 42583.408 108232.617 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong 128 2 42583.443 107367.035 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong 128 3 42583.111 108236.036 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort 128 1 42583.536 108230.365 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort 128 2 41231.639 108239.148 2.62 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort 128 3 42583.630 108238.542 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte 128 1 42584.067 108238.989 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte 128 2 36845.596 108234.297 2.94 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte 128 3 42583.759 108237.501 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt 128 1 42583.319 108236.218 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt 128 2 42583.112 108234.516 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt 128 3 42583.340 108238.777 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong 128 1 42581.004 108233.701 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong 128 2 42583.266 108238.323 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong 128 3 42583.542 108234.327 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort 128 1 42583.552 108238.011 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort 128 2 41231.142 108237.919 2.63 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort 128 3 44784.270 108238.011 2.42 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte 128 1 37075.556 108233.571 2.92 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte 128 2 37527.370 108233.396 2.88 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte 128 3 36585.788 107372.032 2.93 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt 128 1 42583.608 108233.721 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt 128 2 42584.733 107369.578 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt 128 3 42583.623 107367.859 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong 128 1 42583.671 107368.004 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong 128 2 42583.661 108233.301 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong 128 3 42583.015 108232.783 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort 128 1 41229.280 108233.369 2.63 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort 128 2 41231.914 107366.904 2.60 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort 128 3 41231.734 108233.606 2.63 ops/ms
>>
>> All VectorAPI jtreg tests pass with patch [2] is applied together.
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8256973
>> [2] https://github.com/openjdk/jdk17/pull/168
>>
>> Tested tier1 and jdk:tier3.
>
> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
>
> - Merge branch 'jdk:master' into JDK-8269725
> - 8269725: AArch64: Add VectorMask query implementation for NEON
src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 line 2299:
> 2297: ins_encode %{
> 2298: // Revert the bits and count the leading zero bytes.
> 2299: __ negr(as_FloatRegister($tmp$$reg), __ T8B, as_FloatRegister($src$$reg));
Should that be "Reverse the bits?" But in any case, we can see that the code calls rbit then clz, presumably because you want to count the trailing bits. What does the negr do here?
-------------
PR: https://git.openjdk.java.net/jdk/pull/4699
More information about the hotspot-compiler-dev
mailing list