[jdk16] RFR: 8259353: VectorReinterpretNode is incorrectly optimized out

Vladimir Ivanov vlivanov at openjdk.java.net
Mon Jan 11 12:59:00 UTC 2021


On Mon, 11 Jan 2021 09:23:24 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

> Vector reinterpretation just reinterprets the bytes of the vector without performing any value conversions. So normally, optimization like:
>   (VectorReinterpret (VectorReinterpret src))  ->  src
> can be applied correctly if the logical result and the input `"src"` have the same vector type. However, the results might not be correct if truncation happens after the first reinterpretation: `"(VectorReinterpret src)"`.
> 
> Consider the following case:
>   byte[] a = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
>   byte[] b = new byte[16]
>   byte[] c = new byte[16]
> 
>   // load vector from "a" with 128-bits
>   ByteVector av = ByteVector.fromArray(SPECIES_128, a, 0);
> 
>   // reinterpret to 64-bits
>   ByteVector bv = (ByteVector)av.reinterpretShape(SPECIES_64, 0);
> 
>   // reinterpret back to 128-bits with the above reinterpreting results
>   ByteVector cv = (ByteVector)bv.reinterpretShape(SPECIES_128, 0);
>   cv.intoArray(c, 0)
> This case:
>   1. Reinterprets vector `"av"` from 128-bits to 64-bits. It should only copy the first 8 elements to vector `"bv"` and discard the higher half parts.
>   2. Reinterprets vector `"bv"` back to 128-bits. It copies the 64-bits from` "bv"` to` "cv"`, and paddings the rest part of `"cv"` with zero.
>   The final values in array `"c"` are expected to be:
>   c = [ 0, 1, 2, 3, 4, 5, 6, 7, 0,  0, 0, 0, 0, 0, 0, 0]
> However, with the optimization mentioned at the beginning, the second reinterpretation is optimized out. The values in array` "c" `are incorrectly copied from array `"a"`. The values are:
>    c = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
> To fix it, this patch adds the vector size constraint for the optimization, that is the first reinterpretation must not be conducted to a shorter vector.
> 
> It also fixes a potential issue for the implementation of match rule `"reinterpretX2D (from 16 bytes to 8 bytes)" `on Arm NEON. Specifically, the `"mov"` is always needed even if the` "dst"` and `"src"` are the same register since truncation should be conducted in order to be consistent with the semantics.

Marked as reviewed by vlivanov (Reviewer).

Good catch! The fix in shared code looks good.

I suggest to split up the patch and review AArch64-specific changes separately.

-------------

PR: https://git.openjdk.java.net/jdk16/pull/100


More information about the hotspot-compiler-dev mailing list