RFR: 8269725: AArch64: Add VectorMask query implementation for NEON [v5]
Andrew Haley
aph at openjdk.java.net
Wed Jul 14 10:18:17 UTC 2021
On Tue, 13 Jul 2021 06:02:12 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
>> The VectorMask query (`trueCount, firstTrue, lastTrue`) APIs can be intrinsified after [1] is closed. This patch adds the Arm NEON backend implementation for the new added vector nodes.
>>
>> Here is the performance comparison data for the three APIs with and without this patch:
>>
>> Benchmark (bits) (inputs) Before After Gain Units
>> MaskQueryOperationsBenchmark.testFirstTrueByte 128 1 42583.141 103900.253 2.44 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueByte 128 2 37158.470 108234.110 2.91 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueByte 128 3 42583.584 108235.231 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt 128 1 42583.625 108236.859 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt 128 2 42583.288 107368.205 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueInt 128 3 42583.673 108232.371 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong 128 1 42583.408 108232.617 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong 128 2 42583.443 107367.035 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueLong 128 3 42583.111 108236.036 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort 128 1 42583.536 108230.365 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort 128 2 41231.639 108239.148 2.62 ops/ms
>> MaskQueryOperationsBenchmark.testFirstTrueShort 128 3 42583.630 108238.542 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte 128 1 42584.067 108238.989 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte 128 2 36845.596 108234.297 2.94 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueByte 128 3 42583.759 108237.501 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt 128 1 42583.319 108236.218 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt 128 2 42583.112 108234.516 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueInt 128 3 42583.340 108238.777 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong 128 1 42581.004 108233.701 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong 128 2 42583.266 108238.323 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueLong 128 3 42583.542 108234.327 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort 128 1 42583.552 108238.011 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort 128 2 41231.142 108237.919 2.63 ops/ms
>> MaskQueryOperationsBenchmark.testLastTrueShort 128 3 44784.270 108238.011 2.42 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte 128 1 37075.556 108233.571 2.92 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte 128 2 37527.370 108233.396 2.88 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountByte 128 3 36585.788 107372.032 2.93 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt 128 1 42583.608 108233.721 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt 128 2 42584.733 107369.578 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountInt 128 3 42583.623 107367.859 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong 128 1 42583.671 107368.004 2.52 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong 128 2 42583.661 108233.301 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountLong 128 3 42583.015 108232.783 2.54 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort 128 1 41229.280 108233.369 2.63 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort 128 2 41231.914 107366.904 2.60 ops/ms
>> MaskQueryOperationsBenchmark.testTrueCountShort 128 3 41231.734 108233.606 2.63 ops/ms
>>
>> All VectorAPI jtreg tests pass with patch [2] is applied together.
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8256973
>> [2] https://github.com/openjdk/jdk17/pull/168
>>
>> Tested tier1 and jdk:tier3.
>
> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits:
>
> - Merge jdk:master into JDK-8269725
> - Add more comments
> - Remove the begining "negr" for "firsttrue,lasttrue"
> - Merge branch 'jdk:master' into JDK-8269725
> - 8269725: AArch64: Add VectorMask query implementation for NEON
It's looking good, just a few minor issues.
src/hotspot/cpu/aarch64/aarch64.ad line 1320:
> 1318: const TypeVect* vt = def->bottom_type()->is_vect();
> 1319: return vt->length();
> 1320: }
There doesn't seem to be anything AArch64-specific about these functions. I guess if no-one else uses them they can go in aarch64.ad, but it doesn't seem to make much sense.
src/hotspot/cpu/aarch64/aarch64_neon.ad line 5355:
> 5353: __ lsrw($dst$$Register, $dst$$Register, 3);
> 5354: __ movw(rscratch1, vector_length(this, $src));
> 5355: __ cmpw($dst$$Register, rscratch1);
You should be able to use `cmpw($dst$$Register, vector_length(this, $src));` here if `operand_valid_for_add_sub_immediate(vector_length(this, $src))`
src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 line 2318:
> 2316: %}
> 2317: ins_pipe(pipe_slow);
> 2318: %}
Why write `vmask_firsttrue8B` and `vmask_firsttrue_LT8B` separately? All you need is `if (vector_length < 8)` in the encoding rule.
-------------
PR: https://git.openjdk.java.net/jdk/pull/4699
More information about the hotspot-compiler-dev
mailing list