RFR: 8269725: AArch64: Add VectorMask query implementation for NEON [v5]
Xiaohong Gong
xgong at openjdk.java.net
Tue Jul 13 06:02:12 UTC 2021
> The VectorMask query (`trueCount, firstTrue, lastTrue`) APIs can be intrinsified after [1] is closed. This patch adds the Arm NEON backend implementation for the new added vector nodes.
>
> Here is the performance comparison data for the three APIs with and without this patch:
>
> Benchmark (bits) (inputs) Before After Gain Units
> MaskQueryOperationsBenchmark.testFirstTrueByte 128 1 42583.141 103900.253 2.44 ops/ms
> MaskQueryOperationsBenchmark.testFirstTrueByte 128 2 37158.470 108234.110 2.91 ops/ms
> MaskQueryOperationsBenchmark.testFirstTrueByte 128 3 42583.584 108235.231 2.54 ops/ms
> MaskQueryOperationsBenchmark.testFirstTrueInt 128 1 42583.625 108236.859 2.54 ops/ms
> MaskQueryOperationsBenchmark.testFirstTrueInt 128 2 42583.288 107368.205 2.52 ops/ms
> MaskQueryOperationsBenchmark.testFirstTrueInt 128 3 42583.673 108232.371 2.54 ops/ms
> MaskQueryOperationsBenchmark.testFirstTrueLong 128 1 42583.408 108232.617 2.54 ops/ms
> MaskQueryOperationsBenchmark.testFirstTrueLong 128 2 42583.443 107367.035 2.52 ops/ms
> MaskQueryOperationsBenchmark.testFirstTrueLong 128 3 42583.111 108236.036 2.54 ops/ms
> MaskQueryOperationsBenchmark.testFirstTrueShort 128 1 42583.536 108230.365 2.54 ops/ms
> MaskQueryOperationsBenchmark.testFirstTrueShort 128 2 41231.639 108239.148 2.62 ops/ms
> MaskQueryOperationsBenchmark.testFirstTrueShort 128 3 42583.630 108238.542 2.54 ops/ms
> MaskQueryOperationsBenchmark.testLastTrueByte 128 1 42584.067 108238.989 2.54 ops/ms
> MaskQueryOperationsBenchmark.testLastTrueByte 128 2 36845.596 108234.297 2.94 ops/ms
> MaskQueryOperationsBenchmark.testLastTrueByte 128 3 42583.759 108237.501 2.54 ops/ms
> MaskQueryOperationsBenchmark.testLastTrueInt 128 1 42583.319 108236.218 2.54 ops/ms
> MaskQueryOperationsBenchmark.testLastTrueInt 128 2 42583.112 108234.516 2.54 ops/ms
> MaskQueryOperationsBenchmark.testLastTrueInt 128 3 42583.340 108238.777 2.54 ops/ms
> MaskQueryOperationsBenchmark.testLastTrueLong 128 1 42581.004 108233.701 2.54 ops/ms
> MaskQueryOperationsBenchmark.testLastTrueLong 128 2 42583.266 108238.323 2.54 ops/ms
> MaskQueryOperationsBenchmark.testLastTrueLong 128 3 42583.542 108234.327 2.54 ops/ms
> MaskQueryOperationsBenchmark.testLastTrueShort 128 1 42583.552 108238.011 2.54 ops/ms
> MaskQueryOperationsBenchmark.testLastTrueShort 128 2 41231.142 108237.919 2.63 ops/ms
> MaskQueryOperationsBenchmark.testLastTrueShort 128 3 44784.270 108238.011 2.42 ops/ms
> MaskQueryOperationsBenchmark.testTrueCountByte 128 1 37075.556 108233.571 2.92 ops/ms
> MaskQueryOperationsBenchmark.testTrueCountByte 128 2 37527.370 108233.396 2.88 ops/ms
> MaskQueryOperationsBenchmark.testTrueCountByte 128 3 36585.788 107372.032 2.93 ops/ms
> MaskQueryOperationsBenchmark.testTrueCountInt 128 1 42583.608 108233.721 2.54 ops/ms
> MaskQueryOperationsBenchmark.testTrueCountInt 128 2 42584.733 107369.578 2.52 ops/ms
> MaskQueryOperationsBenchmark.testTrueCountInt 128 3 42583.623 107367.859 2.52 ops/ms
> MaskQueryOperationsBenchmark.testTrueCountLong 128 1 42583.671 107368.004 2.52 ops/ms
> MaskQueryOperationsBenchmark.testTrueCountLong 128 2 42583.661 108233.301 2.54 ops/ms
> MaskQueryOperationsBenchmark.testTrueCountLong 128 3 42583.015 108232.783 2.54 ops/ms
> MaskQueryOperationsBenchmark.testTrueCountShort 128 1 41229.280 108233.369 2.63 ops/ms
> MaskQueryOperationsBenchmark.testTrueCountShort 128 2 41231.914 107366.904 2.60 ops/ms
> MaskQueryOperationsBenchmark.testTrueCountShort 128 3 41231.734 108233.606 2.63 ops/ms
>
> All VectorAPI jtreg tests pass with patch [2] is applied together.
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8256973
> [2] https://github.com/openjdk/jdk17/pull/168
>
> Tested tier1 and jdk:tier3.
Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits:
- Merge jdk:master into JDK-8269725
- Add more comments
- Remove the begining "negr" for "firsttrue,lasttrue"
- Merge branch 'jdk:master' into JDK-8269725
- 8269725: AArch64: Add VectorMask query implementation for NEON
-------------
Changes: https://git.openjdk.java.net/jdk/pull/4699/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4699&range=04
Stats: 461 lines in 8 files changed: 360 ins; 24 del; 77 mod
Patch: https://git.openjdk.java.net/jdk/pull/4699.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/4699/head:pull/4699
PR: https://git.openjdk.java.net/jdk/pull/4699
More information about the hotspot-compiler-dev
mailing list