RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width

Tue Feb 9 07:55:43 UTC 2021

On Tue, 9 Feb 2021 06:55:50 GMT, Dong Bo <dongbo at openjdk.org> wrote:

> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero,
> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`:
>     /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */
>     public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT);
> 
> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64,
> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead.
> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long.
> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i);
> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i);
> 
> The legal right shift amount should be in the range 1 to the element width in bits on aarch64:
> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en
> 
> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate.
> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests.

Thanks for the fix.

src/hotspot/cpu/aarch64/aarch64_neon.ad line 5285:

> 5283:   ins_encode %{
> 5284:     int sh = (int)$shift$$constant;
> 5285:     if (sh == 0) {

If src and dst are the same reg, no need to emit code. Or maybe c2 can even be improved to optimize this (sh=0 case) out?

src/hotspot/cpu/aarch64/aarch64_neon.ad line 5271:

> 5269:     } else {
> 5270:       if (sh >= 8) sh = 7;
> 5271:       __ sshr(as_FloatRegister($dst$$reg), __ T8B,

I think we should add an assert to make sure 0 is not passed to the assembler.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2472