RFR: 8269725: AArch64: Add VectorMask query implementation for NEON

Xiaohong Gong xgong at openjdk.java.net
Wed Jul 7 01:11:08 UTC 2021


The VectorMask query (`trueCount, firstTrue, lastTrue`) APIs can be intrinsified after [1] is closed. This patch adds the Arm NEON backend implementation for the new added vector nodes.

Here is the performance comparison data for the three APIs with and without this patch:

Benchmark                                        (bits) (inputs) Before       After      Gain  Units
MaskQueryOperationsBenchmark.testFirstTrueByte    128      1    42583.141   103900.253   2.44  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueByte    128      2    37158.470   108234.110   2.91  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueByte    128      3    42583.584   108235.231   2.54  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueInt     128      1    42583.625   108236.859   2.54  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueInt     128      2    42583.288   107368.205   2.52  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueInt     128      3    42583.673   108232.371   2.54  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueLong    128      1    42583.408   108232.617   2.54  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueLong    128      2    42583.443   107367.035   2.52  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueLong    128      3    42583.111   108236.036   2.54  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueShort   128      1    42583.536   108230.365   2.54  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueShort   128      2    41231.639   108239.148   2.62  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueShort   128      3    42583.630   108238.542   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueByte     128      1    42584.067   108238.989   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueByte     128      2    36845.596   108234.297   2.94  ops/ms
MaskQueryOperationsBenchmark.testLastTrueByte     128      3    42583.759   108237.501   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueInt      128      1    42583.319   108236.218   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueInt      128      2    42583.112   108234.516   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueInt      128      3    42583.340   108238.777   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueLong     128      1    42581.004   108233.701   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueLong     128      2    42583.266   108238.323   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueLong     128      3    42583.542   108234.327   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueShort    128      1    42583.552   108238.011   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueShort    128      2    41231.142   108237.919   2.63  ops/ms
MaskQueryOperationsBenchmark.testLastTrueShort    128      3    44784.270   108238.011   2.42  ops/ms
MaskQueryOperationsBenchmark.testTrueCountByte    128      1    37075.556   108233.571   2.92  ops/ms
MaskQueryOperationsBenchmark.testTrueCountByte    128      2    37527.370   108233.396   2.88  ops/ms
MaskQueryOperationsBenchmark.testTrueCountByte    128      3    36585.788   107372.032   2.93  ops/ms
MaskQueryOperationsBenchmark.testTrueCountInt     128      1    42583.608   108233.721   2.54  ops/ms
MaskQueryOperationsBenchmark.testTrueCountInt     128      2    42584.733   107369.578   2.52  ops/ms
MaskQueryOperationsBenchmark.testTrueCountInt     128      3    42583.623   107367.859   2.52  ops/ms
MaskQueryOperationsBenchmark.testTrueCountLong    128      1    42583.671   107368.004   2.52  ops/ms
MaskQueryOperationsBenchmark.testTrueCountLong    128      2    42583.661   108233.301   2.54  ops/ms
MaskQueryOperationsBenchmark.testTrueCountLong    128      3    42583.015   108232.783   2.54  ops/ms
MaskQueryOperationsBenchmark.testTrueCountShort   128      1    41229.280   108233.369   2.63  ops/ms
MaskQueryOperationsBenchmark.testTrueCountShort   128      2    41231.914   107366.904   2.60  ops/ms
MaskQueryOperationsBenchmark.testTrueCountShort   128      3    41231.734   108233.606   2.63  ops/ms

All VectorAPI jtreg tests pass with patch [2] is applied together.

[1] https://bugs.openjdk.java.net/browse/JDK-8256973
[2] https://github.com/openjdk/jdk17/pull/168

Tested tier1 and jdk:tier3.

-------------

Commit messages:
 - 8269725: AArch64: Add VectorMask query implementation for NEON

Changes: https://git.openjdk.java.net/jdk/pull/4699/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4699&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8269725
  Stats: 413 lines in 8 files changed: 316 ins; 24 del; 73 mod
  Patch: https://git.openjdk.java.net/jdk/pull/4699.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/4699/head:pull/4699

PR: https://git.openjdk.java.net/jdk/pull/4699


More information about the hotspot-compiler-dev mailing list