RFR: 8283307: Vectorize unsigned shift right on signed subword types [v2]
Fei Gao
fgao at openjdk.java.net
Fri Apr 22 11:15:18 UTC 2022
On Wed, 20 Apr 2022 04:04:34 GMT, Jie Fu <jiefu at openjdk.org> wrote:
>> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>>
>> - Remove related comments in some test files
>>
>> Change-Id: I5dd1c156bd80221dde53737e718da0254c5381d8
>> - Merge branch 'master' into fg8283307
>>
>> Change-Id: Ic4645656ea156e8cac993995a5dc675aa46cb21a
>> - 8283307: Vectorize unsigned shift right on signed subword types
>>
>> ```
>> public short[] vectorUnsignedShiftRight(short[] shorts) {
>> short[] res = new short[SIZE];
>> for (int i = 0; i < SIZE; i++) {
>> res[i] = (short) (shorts[i] >>> 3);
>> }
>> return res;
>> }
>> ```
>> In C2's SLP, vectorization of unsigned shift right on signed
>> subword types (byte/short) like the case above is intentionally
>> disabled[1]. Because the vector unsigned shift on signed
>> subword types behaves differently from the Java spec. It's
>> worthy to vectorize more cases in quite low cost. Also,
>> unsigned shift right on signed subword is not uncommon and we
>> may find similar cases in Lucene benchmark[2].
>>
>> Taking unsigned right shift on short type as an example,
>>
>> Short:
>> | <- 16 bits -> | <- 16 bits -> |
>> | 1 1 1 ... 1 1 | data |
>>
>> when the shift amount is a constant not greater than the number
>> of sign extended bits, 16 higher bits for short type shown like
>> above, the unsigned shift on signed subword types can be
>> transformed into a signed shift and hence becomes vectorizable.
>> Here is the transformation:
>>
>> For T_SHORT (shift <= 16):
>> src RShiftCntV shift src RShiftCntV shift
>> \ / ==> \ /
>> URShiftVS RShiftVS
>>
>> This patch does the transformation in SuperWord::implemented() and
>> SuperWord::output(). It helps vectorize the short cases above. We
>> can handle unsigned right shift on byte type in a similar way. The
>> generated assembly code for one iteration on aarch64 is like:
>> ```
>> ...
>> sbfiz x13, x10, #1, #32
>> add x15, x11, x13
>> ldr q16, [x15, #16]
>> sshr v16.8h, v16.8h, #3
>> add x13, x17, x13
>> str q16, [x13, #16]
>> ...
>> ```
>>
>> Here is the performance data for micro-benchmark before and after
>> this patch on both AArch64 and x64 machines. We can observe about
>> ~80% improvement with this patch.
>>
>> The perf data on AArch64:
>> Before the patch:
>> Benchmark (SIZE) (shiftCount) Mode Cnt Score Error Units
>> urShiftImmByte 1024 3 avgt 5 295.711 ± 0.117 ns/op
>> urShiftImmShort 1024 3 avgt 5 284.559 ± 0.148 ns/op
>>
>> after the patch:
>> Benchmark (SIZE) (shiftCount) Mode Cnt Score Error Units
>> urShiftImmByte 1024 3 avgt 5 45.111 ± 0.047 ns/op
>> urShiftImmShort 1024 3 avgt 5 55.294 ± 0.072 ns/op
>>
>> The perf data on X86:
>> Before the patch:
>> Benchmark (SIZE) (shiftCount) Mode Cnt Score Error Units
>> urShiftImmByte 1024 3 avgt 5 361.374 ± 4.621 ns/op
>> urShiftImmShort 1024 3 avgt 5 365.390 ± 3.595 ns/op
>>
>> After the patch:
>> Benchmark (SIZE) (shiftCount) Mode Cnt Score Error Units
>> urShiftImmByte 1024 3 avgt 5 105.489 ± 0.488 ns/op
>> urShiftImmShort 1024 3 avgt 5 43.400 ± 0.394 ns/op
>>
>> [1] https://github.com/openjdk/jdk/blob/002e3667443d94e2303c875daf72cf1ccbbb0099/src/hotspot/share/opto/vectornode.cpp#L190
>> [2] https://github.com/jpountz/decode-128-ints-benchmark/
>>
>> Change-Id: I9bd0cfdfcd9c477e8905a4c877d5e7ff14e39161
>
> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizeURShiftSubword.java line 114:
>
>> 112: testByte0();
>> 113: for (int i = 0; i < bytea.length; i++) {
>> 114: Asserts.assertEquals(byteb[i], (byte) (bytea[i] >>> 3));
>
> I'm still a bit worried about the test.
>
> Suggestion:
> Rewrite
>
> Asserts.assertEquals(byteb[i], (byte) (bytea[i] >>> 3));
>
>
> to
>
> Asserts.assertEquals(byteb[i], urshift(bytea[i], 3)));
>
> And disable inlining during the testing.
Done. Thanks very much @DamonFool .
-------------
PR: https://git.openjdk.java.net/jdk/pull/7979
More information about the hotspot-compiler-dev
mailing list