RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v6]

Thu Feb 18 09:37:43 UTC 2021

On Thu, 18 Feb 2021 07:57:25 GMT, Dong Bo <dongbo at openjdk.org> wrote:

>> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero,
>> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`:
>>     /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */
>>     public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT);
>> 
>> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64,
>> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead.
>> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long.
>> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i);
>> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i);
>> 
>> The legal right shift amount should be in the range 1 to the element width in bits on aarch64:
>> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en
>> 
>> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate.
>> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests.
>
> Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:
> 
>  - add tests in aarch64-asmtest.py
>  - fix windows operator precedence error and cleanup testcase
>  - Merge branch 'master' into aarch64_vector_api_shift
>  - fix windows build failure
>  - generate add if shift == 0 for accumulation and fix some test code
>  - back out AD modifications and handle zero shift in assembler

src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2709:

> 2707:       f(encodedShift, 22, 16); f(opc2, 15, 10), rf(Vn, 5), rf(Vd, 0);   \
> 2708:     }                                                                   \
> 2709:   }

Is this correct, according to the definition in the Architecture Reference Manual? It doesn't look like it to me. Assembler methods should generate bit patterns exactly as defined in the Manual. This logic should be in a MacroAssembler method.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2472