RFR: 8269725: AArch64: Add VectorMask query implementation for NEON [v3]

Andrew Haley aph at openjdk.java.net
Fri Jul 9 08:28:54 UTC 2021


On Fri, 9 Jul 2021 06:44:14 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> The VectorMask query (`trueCount, firstTrue, lastTrue`) APIs can be intrinsified after [1] is closed. This patch adds the Arm NEON backend implementation for the new added vector nodes.
>> 
>> Here is the performance comparison data for the three APIs with and without this patch:
>> 
>> Benchmark                                        (bits) (inputs) Before       After      Gain  Units
>> MaskQueryOperationsBenchmark.testFirstTrueByte    128      1    42583.141   103900.253   2.44  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueByte    128      2    37158.470   108234.110   2.91  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueByte    128      3    42583.584   108235.231   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt     128      1    42583.625   108236.859   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt     128      2    42583.288   107368.205   2.52  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt     128      3    42583.673   108232.371   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong    128      1    42583.408   108232.617   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong    128      2    42583.443   107367.035   2.52  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong    128      3    42583.111   108236.036   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort   128      1    42583.536   108230.365   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort   128      2    41231.639   108239.148   2.62  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort   128      3    42583.630   108238.542   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte     128      1    42584.067   108238.989   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte     128      2    36845.596   108234.297   2.94  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte     128      3    42583.759   108237.501   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt      128      1    42583.319   108236.218   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt      128      2    42583.112   108234.516   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt      128      3    42583.340   108238.777   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong     128      1    42581.004   108233.701   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong     128      2    42583.266   108238.323   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong     128      3    42583.542   108234.327   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort    128      1    42583.552   108238.011   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort    128      2    41231.142   108237.919   2.63  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort    128      3    44784.270   108238.011   2.42  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte    128      1    37075.556   108233.571   2.92  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte    128      2    37527.370   108233.396   2.88  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte    128      3    36585.788   107372.032   2.93  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt     128      1    42583.608   108233.721   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt     128      2    42584.733   107369.578   2.52  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt     128      3    42583.623   107367.859   2.52  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong    128      1    42583.671   107368.004   2.52  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong    128      2    42583.661   108233.301   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong    128      3    42583.015   108232.783   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort   128      1    41229.280   108233.369   2.63  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort   128      2    41231.914   107366.904   2.60  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort   128      3    41231.734   108233.606   2.63  ops/ms
>> 
>> All VectorAPI jtreg tests pass with patch [2] is applied together.
>> 
>> [1] https://bugs.openjdk.java.net/browse/JDK-8256973
>> [2] https://github.com/openjdk/jdk17/pull/168
>> 
>> Tested tier1 and jdk:tier3.
>
> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove the begining "negr" for "firsttrue,lasttrue"

src/hotspot/cpu/aarch64/aarch64_neon.ad line 5340:

> 5338:   format %{ "vmask_firsttrue $dst, $src\t# vector (4I/4S/2I)" %}
> 5339:   ins_encode %{
> 5340:     // Input "src" is a vector of boolean with "0/1" as the element values.

Say "vector of bytes." I read this as a vector of bits.

src/hotspot/cpu/aarch64/aarch64_neon.ad line 5381:

> 5379:     Label FIRST_TRUE_INDEX;
> 5380: 
> 5381:     // Move the lower 64-bits to a general register and check whether the

"64 bits". No hyphen.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4699


More information about the hotspot-compiler-dev mailing list