[jdk16] Integrated: 8259353: VectorReinterpretNode is incorrectly optimized out
Xiaohong Gong
xgong at openjdk.java.net
Wed Jan 13 05:52:59 UTC 2021
On Mon, 11 Jan 2021 09:23:24 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
> Vector reinterpretation just reinterprets the bytes of the vector without performing any value conversions. So normally, optimization like:
> (VectorReinterpret (VectorReinterpret src)) -> src
> can be applied correctly if the logical result and the input `"src"` have the same vector type. However, the results might not be correct if truncation happens after the first reinterpretation: `"(VectorReinterpret src)"`.
>
> Consider the following case:
> byte[] a = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
> byte[] b = new byte[16]
> byte[] c = new byte[16]
>
> // load vector from "a" with 128-bits
> ByteVector av = ByteVector.fromArray(SPECIES_128, a, 0);
>
> // reinterpret to 64-bits
> ByteVector bv = (ByteVector)av.reinterpretShape(SPECIES_64, 0);
>
> // reinterpret back to 128-bits with the above reinterpreting results
> ByteVector cv = (ByteVector)bv.reinterpretShape(SPECIES_128, 0);
> cv.intoArray(c, 0)
> This case:
> 1. Reinterprets vector `"av"` from 128-bits to 64-bits. It should only copy the first 8 elements to vector `"bv"` and discard the higher half parts.
> 2. Reinterprets vector `"bv"` back to 128-bits. It copies the 64-bits from` "bv"` to` "cv"`, and paddings the rest part of `"cv"` with zero.
> The final values in array `"c"` are expected to be:
> c = [ 0, 1, 2, 3, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0]
> However, with the optimization mentioned at the beginning, the second reinterpretation is optimized out. The values in array` "c" `are incorrectly copied from array `"a"`. The values are:
> c = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
> To fix it, this patch adds the vector size constraint for the optimization, that is the first reinterpretation must not be conducted to a shorter vector.
>
> It also fixes a potential issue for the implementation of match rule `"reinterpretX2D (from 16 bytes to 8 bytes)" `on Arm NEON. Specifically, the `"mov"` is always needed even if the` "dst"` and `"src"` are the same register since truncation should be conducted in order to be consistent with the semantics.
This pull request has now been integrated.
Changeset: 1cf2378b
Author: Xiaohong Gong <xgong at openjdk.org>
Committer: Ningsheng Jian <njian at openjdk.org>
URL: https://git.openjdk.java.net/jdk16/commit/1cf2378b
Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod
8259353: VectorReinterpretNode is incorrectly optimized out
Reviewed-by: vlivanov, njian
-------------
PR: https://git.openjdk.java.net/jdk16/pull/100
More information about the hotspot-compiler-dev
mailing list