RFR: 8269725: AArch64: Add VectorMask query implementation for NEON [v5]

Xiaohong Gong xgong at openjdk.java.net
Wed Jul 14 07:09:12 UTC 2021


On Tue, 13 Jul 2021 06:02:12 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> The VectorMask query (`trueCount, firstTrue, lastTrue`) APIs can be intrinsified after [1] is closed. This patch adds the Arm NEON backend implementation for the new added vector nodes.
>> 
>> Here is the performance comparison data for the three APIs with and without this patch:
>> 
>> Benchmark                                        (bits) (inputs) Before       After      Gain  Units
>> MaskQueryOperationsBenchmark.testFirstTrueByte    128      1    42583.141   103900.253   2.44  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueByte    128      2    37158.470   108234.110   2.91  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueByte    128      3    42583.584   108235.231   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt     128      1    42583.625   108236.859   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt     128      2    42583.288   107368.205   2.52  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt     128      3    42583.673   108232.371   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong    128      1    42583.408   108232.617   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong    128      2    42583.443   107367.035   2.52  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong    128      3    42583.111   108236.036   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort   128      1    42583.536   108230.365   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort   128      2    41231.639   108239.148   2.62  ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort   128      3    42583.630   108238.542   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte     128      1    42584.067   108238.989   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte     128      2    36845.596   108234.297   2.94  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte     128      3    42583.759   108237.501   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt      128      1    42583.319   108236.218   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt      128      2    42583.112   108234.516   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt      128      3    42583.340   108238.777   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong     128      1    42581.004   108233.701   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong     128      2    42583.266   108238.323   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong     128      3    42583.542   108234.327   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort    128      1    42583.552   108238.011   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort    128      2    41231.142   108237.919   2.63  ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort    128      3    44784.270   108238.011   2.42  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte    128      1    37075.556   108233.571   2.92  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte    128      2    37527.370   108233.396   2.88  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte    128      3    36585.788   107372.032   2.93  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt     128      1    42583.608   108233.721   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt     128      2    42584.733   107369.578   2.52  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt     128      3    42583.623   107367.859   2.52  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong    128      1    42583.671   107368.004   2.52  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong    128      2    42583.661   108233.301   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong    128      3    42583.015   108232.783   2.54  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort   128      1    41229.280   108233.369   2.63  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort   128      2    41231.914   107366.904   2.60  ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort   128      3    41231.734   108233.606   2.63  ops/ms
>> 
>> All VectorAPI jtreg tests pass with patch [2] is applied together.
>> 
>> [1] https://bugs.openjdk.java.net/browse/JDK-8256973
>> [2] https://github.com/openjdk/jdk17/pull/168
>> 
>> Tested tier1 and jdk:tier3.
>
> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits:
> 
>  - Merge jdk:master into JDK-8269725
>  - Add more comments
>  - Remove the begining "negr" for "firsttrue,lasttrue"
>  - Merge branch 'jdk:master' into JDK-8269725
>  - 8269725: AArch64: Add VectorMask query implementation for NEON

Hi @theRealAph , all your significant comments have been addressed. Would you mind taking a look at this PR again? Thanks so much!

-------------

PR: https://git.openjdk.java.net/jdk/pull/4699


More information about the hotspot-compiler-dev mailing list