Integrated: 8255949: AArch64: Add support for vectorized shift right and accumulate
Dong Bo
dongbo at openjdk.java.net
Tue Nov 10 01:28:57 UTC 2020
On Fri, 6 Nov 2020 03:36:57 GMT, Dong Bo <dongbo at openjdk.org> wrote:
> This supports missing NEON shift right and accumulate instructions, i.e. SSRA and USRA, for AArch64 backend.
>
> Verified with linux-aarch64-server-release, tier1-3.
>
> Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java` for performance test.
> We witness about ~20% with different basic types on Kunpeng916. The JMH results:
> Benchmark (count) (seed) Mode Cnt Score Error Units
> # before, Kunpeng 916
> VectorShiftAccumulate.shiftRightAccumulateByte 1028 0 avgt 10 146.259 ± 0.123 ns/op
> VectorShiftAccumulate.shiftRightAccumulateInt 1028 0 avgt 10 454.781 ± 3.856 ns/op
> VectorShiftAccumulate.shiftRightAccumulateLong 1028 0 avgt 10 938.842 ± 23.288 ns/op
> VectorShiftAccumulate.shiftRightAccumulateShort 1028 0 avgt 10 205.493 ± 4.938 ns/op
> VectorShiftAccumulate.shiftURightAccumulateByte 1028 0 avgt 10 905.483 ± 0.309 ns/op (not vectorized)
> VectorShiftAccumulate.shiftURightAccumulateChar 1028 0 avgt 10 220.847 ± 5.868 ns/op
> VectorShiftAccumulate.shiftURightAccumulateInt 1028 0 avgt 10 442.587 ± 6.980 ns/op
> VectorShiftAccumulate.shiftURightAccumulateLong 1028 0 avgt 10 936.289 ± 21.458 ns/op
> # after shift right and accumulate, Kunpeng 916
> VectorShiftAccumulate.shiftRightAccumulateByte 1028 0 avgt 10 125.586 ± 0.204 ns/op
> VectorShiftAccumulate.shiftRightAccumulateInt 1028 0 avgt 10 365.973 ± 6.466 ns/op
> VectorShiftAccumulate.shiftRightAccumulateLong 1028 0 avgt 10 804.605 ± 12.336 ns/op
> VectorShiftAccumulate.shiftRightAccumulateShort 1028 0 avgt 10 170.123 ± 4.678 ns/op
> VectorShiftAccumulate.shiftURightAccumulateByte 1028 0 avgt 10 905.779 ± 0.587 ns/op (not vectorized)
> VectorShiftAccumulate.shiftURightAccumulateChar 1028 0 avgt 10 185.799 ± 4.764 ns/op
> VectorShiftAccumulate.shiftURightAccumulateInt 1028 0 avgt 10 364.360 ± 6.522 ns/op
> VectorShiftAccumulate.shiftURightAccumulateLong 1028 0 avgt 10 800.737 ± 13.735 ns/op
>
> We checked the shiftURightAccumulateByte test, the performance stays same since it is not vectorized with or without this patch, due to:
> src/hotspot/share/opto/vectornode.cpp, line 226:
> case Op_URShiftI:
> switch (bt) {
> case T_BOOLEAN:return Op_URShiftVB;
> case T_CHAR: return Op_URShiftVS;
> case T_BYTE:
> case T_SHORT: return 0; // Vector logical right shift for signed short
> // values produces incorrect Java result for
> // negative data because java code should convert
> // a short value into int value with sign
> // extension before a shift.
> case T_INT: return Op_URShiftVI;
> default: ShouldNotReachHere(); return 0;
> }
> We also tried the existing vector operation micro urShiftB, i.e.:
> test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java, line 116
> @Benchmark
> public void urShiftB() {
> for (int i = 0; i < COUNT; i++) {
> resB[i] = (byte) (bytesA[i] >>> 3);
> }
> }
> It is not vectorlized too. Seems it's hard to match JAVA code with the URShiftVB node.
This pull request has now been integrated.
Changeset: f71f9dc9
Author: Dong Bo <dongbo at openjdk.org>
Committer: Fei Yang <fyang at openjdk.org>
URL: https://git.openjdk.java.net/jdk/commit/f71f9dc9
Stats: 349 lines in 3 files changed: 349 ins; 0 del; 0 mod
8255949: AArch64: Add support for vectorized shift right and accumulate
Reviewed-by: aph
-------------
PR: https://git.openjdk.java.net/jdk/pull/1087
More information about the hotspot-compiler-dev
mailing list