RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v10]
Dong Bo
dongbo at openjdk.java.net
Thu Feb 25 01:47:40 UTC 2021
On Wed, 24 Feb 2021 07:31:14 GMT, Dong Bo <dongbo at openjdk.org> wrote:
>>> Hi, @theRealAph
>>>
>>> I've rebased the tests so that sshr/ushr/sshl/ssra/usra are accessed in one jtreg `test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java`.
>>> Local tests by manually injected error shows all instructions are covered by the jtreg case. Suggestions?
>>
>> I'm not seeing ```sra``` used anywhere.
>>
>> The problem I see with the tests is that the methods are large. This causes C2 to do a lot of spilling. Also, because the resuling code is intertwined and complex, it's very hard to debug.
>>
>> It would be far better to do something like this:
>> void long_shift_add(long arrLongs[][], LongVector vba, LongVector vbb, int i) {
>> vba.add(vbb.lanewise(VectorOperators.LSHR, 37)).intoArray(arrLongs[op], 2 * i);
>> vba.add(vbb.lanewise(VectorOperators.LSHR, 64)).intoArray(arrLongs[op + 1], 2 * i);
>> vba.add(vbb.lanewise(VectorOperators.LSHR, 99)).intoArray(arrLongs[op + 2], 2 * i);
>> vba.add(vbb.lanewise(VectorOperators.LSHR, 128)).intoArray(arrLongs[op + 3], 2 * i);
>> vba.add(vbb.lanewise(VectorOperators.LSHR, 157)).intoArray(arrLongs[op + 4], 2 * i);
>> vba.add(vbb.lanewise(VectorOperators.LSHR, 192)).intoArray(arrLongs[op + 5], 2 * i);
>> }
>
>> > Local tests by manually injected error shows all instructions are covered by the jtreg case. Suggestions?
>>
>> I'm not seeing `sra` used anywhere.
>>
>> The problem I see with the tests is that the methods are large. This causes C2 to do a lot of spilling. Also, because the resuling code is intertwined and complex, it's very hard to debug.
>>
>> It would be far better to do something like this:
>>
>> ```
>> void long_shift_add(long arrLongs[][], LongVector vba, LongVector vbb, int i) {
>> vba.add(vbb.lanewise(VectorOperators.LSHR, 37)).intoArray(arrLongs[op], 2 * i);
>> vba.add(vbb.lanewise(VectorOperators.LSHR, 64)).intoArray(arrLongs[op + 1], 2 * i);
>> vba.add(vbb.lanewise(VectorOperators.LSHR, 99)).intoArray(arrLongs[op + 2], 2 * i);
>> vba.add(vbb.lanewise(VectorOperators.LSHR, 128)).intoArray(arrLongs[op + 3], 2 * i);
>> vba.add(vbb.lanewise(VectorOperators.LSHR, 157)).intoArray(arrLongs[op + 4], 2 * i);
>> vba.add(vbb.lanewise(VectorOperators.LSHR, 192)).intoArray(arrLongs[op + 5], 2 * i);
>> }
>> ```
>
>
> Weird, I took a look at the the assembly, `ssra` did accessed by the tests on our server:
> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 64 &> assembly_vlen64.txt
> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 128 &> assembly_vlen128.txt
> $ cat assembly_vlen*.txt | grep "ssra"
> 02c0 ssra V18, V17, #37 # vector (2D)
> 02c8 ssra V19, V17, #0 # vector (2D)
> 02d0 ssra V20, V17, #35 # vector (2D)
> 0308 ssra V18, V17, #29 # vector (2D)
> 0644 ssra V18, V17, #37 # vector (2D)
> 064c ssra V19, V17, #0 # vector (2D)
> 0654 ssra V20, V17, #35 # vector (2D)
> 0674 ssra V18, V17, #29 # vector (2D)
> 0798 ssra V18, V17, #37 # vector (2D)
> 07a0 ssra V19, V17, #0 # vector (2D)
> 07a8 ssra V20, V17, #35 # vector (2D)
> 07e0 ssra V18, V17, #29 # vector (2D)
> 0x0000ffff83f7e500: ssra v18.2d, v17.2d, #37 ;*aload_0 {reexecute=0 rethrow=0 return_oop=0}
> 0x0000ffff83f7e510: ssra v20.2d, v17.2d, #35 ;*iand {reexecute=0 rethrow=0 return_oop=0}
> 0x0000ffff83f7e548: ssra v18.2d, v17.2d, #29 ;*if_icmpne {reexecute=0 rethrow=0 return_oop=0}
> 0x0000ffff83f7e884: ssra v18.2d, v17.2d, #37
> 0x0000ffff83f7e894: ssra v20.2d, v17.2d, #35
> 0x0000ffff83f7e8b4: ssra v18.2d, v17.2d, #29
> 0x0000ffff83f7e9d8: ssra v18.2d, v17.2d, #37 ;*invokestatic broadcastInt {reexecute=0 rethrow=0 return_oop=0}
> 0x0000ffff83f7e9e8: ssra v20.2d, v17.2d, #35 ;*invokestatic broadcastInt {reexecute=0 rethrow=0 return_oop=0}
> 0x0000ffff83f7ea20: ssra v18.2d, v17.2d, #29 ;*checkcast {reexecute=0 rethrow=0 return_oop=0}
> 284 ssra V18, V17, #9 # vector (4S)
> 28c ssra V19, V17, #0 # vector (4S)
> 294 ssra V20, V17, #15 # vector (4S)
> 0x0000ffff83f822c4: ssra v18.4s, v17.4s, #9 ;*invokedynamic {reexecute=0 rethrow=0 return_oop=0}
> 0x0000ffff83f822d4: ssra v20.4s, v17.4s, #15
> 284 ssra V18, V17, #1 # vector (8H)
> 28c ssra V19, V17, #8 # vector (8H)
> ...
>
> Also injected error to `sshr+add` by:
> --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp
> +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp
> @@ -545,7 +545,7 @@ public:
> #define WRAP(INSN) \
> void INSN(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn, int shift) { \
> if (shift == 0) { \
> - Assembler::addv(Vd, T, Vd, Vn); \
> + Assembler::subv(Vd, T, Vd, Vn); \
> } else { \
> Assembler::INSN(Vd, T, Vn, shift); \
> } \
> The `shift+add` tests failed as expected:
> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:-TieredCompilation test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 64
> WARNING: Using incubator modules: jdk.incubator.vector
> warning: using incubating module(s): jdk.incubator.vector
> 1 warning
> Exception in thread "main" java.lang.RuntimeException: Test Failed, failed tests:
> type SHORT index 19, operation ASHR_AND_ACCUMULATE, vector length 64.
> type SHORT index 21, operation ASHR_AND_ACCUMULATE, vector length 64.
> type SHORT index 23, operation ASHR_AND_ACCUMULATE, vector length 64.
> type SHORT index 25, operation LSHR_AND_ACCUMULATE, vector length 64.
> type SHORT index 27, operation LSHR_AND_ACCUMULATE, vector length 64.
> type SHORT index 29, operation LSHR_AND_ACCUMULATE, vector length 64.
> type INTEGER index 19, operation ASHR_AND_ACCUMULATE, vector length 64.
> type INTEGER index 21, operation ASHR_AND_ACCUMULATE, vector length 64.
> type INTEGER index 23, operation ASHR_AND_ACCUMULATE, vector length 64.
> type INTEGER index 25, operation LSHR_AND_ACCUMULATE, vector length 64.
> type INTEGER index 27, operation LSHR_AND_ACCUMULATE, vector length 64.
> type INTEGER index 29, operation LSHR_AND_ACCUMULATE, vector length 64.
> ...
> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:-TieredCompilation test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 128
> WARNING: Using incubator modules: jdk.incubator.vector
> warning: using incubating module(s): jdk.incubator.vector
> 1 warning
> Exception in thread "main" java.lang.RuntimeException: Test Failed, failed tests:
> type LONG index 49, operation ASHR_AND_ACCUMULATE, vector length 128.
> type LONG index 51, operation ASHR_AND_ACCUMULATE, vector length 128.
> type LONG index 53, operation ASHR_AND_ACCUMULATE, vector length 128.
> type LONG index 55, operation LSHR_AND_ACCUMULATE, vector length 128.
> type LONG index 57, operation LSHR_AND_ACCUMULATE, vector length 128.
> type LONG index 59, operation LSHR_AND_ACCUMULATE, vector length 128.
> type SHORT index 49, operation ASHR_AND_ACCUMULATE, vector length 128.
> type SHORT index 51, operation ASHR_AND_ACCUMULATE, vector length 128.
> type SHORT index 53, operation ASHR_AND_ACCUMULATE, vector length 128.
> ...
>
> Anyway, I extracted operations you suggested into `shift_op_*` methods.
> Performed the error-injected experiments with the new tests on Kunpeng916 and re-checked the assembly output, results looks good.
>
> The test command I used to run the newest tests are:
> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:-TieredCompilation -XX:CompileThreshold=1000 -Dvlen=64 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java &> assembly_vlen64.txt
> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:-TieredCompilation -XX:CompileThreshold=1000 -Dvlen=128 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java &> assembly_vlen128.txt
> $ cat assembly_vlen64.txt | grep ssra; cat assembly_vlen128.txt | grep ssra
> _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_
>
> On 24/02/2021 07:33, Dong Bo wrote:
>
> > Weird, I took a look at the the assembly, `ssra` did accessed by the tests on our server:
>
> I don't doubt it, but the test code is so very complex that it can
> fall foul of heuristics given slightly changed circumstances. That's
> why good test cases are as simple as possible, and allow no room for
> variations because they do only one thing. Precise targeting should
> be the goal of HotSpot back-end test cases.
>
Understood, thanks. :-)
Does the newest version address the considerations?
I extracted the `shift`/`shift+add` operations into separate methods, mostly as suggested in previous comments, something like:
static int shift_op_long_ASHR_and_ADD(LongVector vba, LongVector vbb, long arrLongs[][], int end, int ind) {
vba.add(vbb.lanewise(VectorOperators.ASHR, 37)).intoArray(arrLongs[end++], ind);
vba.add(vbb.lanewise(VectorOperators.ASHR, 64)).intoArray(arrLongs[end++], ind);
vba.add(vbb.lanewise(VectorOperators.ASHR, 99)).intoArray(arrLongs[end++], ind);
vba.add(vbb.lanewise(VectorOperators.ASHR, 128)).intoArray(arrLongs[end++], ind);
vba.add(vbb.lanewise(VectorOperators.ASHR, 157)).intoArray(arrLongs[end++], ind);
vba.add(vbb.lanewise(VectorOperators.ASHR, 192)).intoArray(arrLongs[end++], ind);
return end;
}
-------------
PR: https://git.openjdk.java.net/jdk/pull/2472
More information about the hotspot-compiler-dev
mailing list