[jdk16] RFR: 8259353: VectorReinterpretNode is incorrectly optimized out [v2]

Xiaohong Gong xgong at openjdk.java.net
Tue Jan 12 02:52:13 UTC 2021


> Vector reinterpretation just reinterprets the bytes of the vector without performing any value conversions. So normally, optimization like:
>   (VectorReinterpret (VectorReinterpret src))  ->  src
> can be applied correctly if the logical result and the input `"src"` have the same vector type. However, the results might not be correct if truncation happens after the first reinterpretation: `"(VectorReinterpret src)"`.
> 
> Consider the following case:
>   byte[] a = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
>   byte[] b = new byte[16]
>   byte[] c = new byte[16]
> 
>   // load vector from "a" with 128-bits
>   ByteVector av = ByteVector.fromArray(SPECIES_128, a, 0);
> 
>   // reinterpret to 64-bits
>   ByteVector bv = (ByteVector)av.reinterpretShape(SPECIES_64, 0);
> 
>   // reinterpret back to 128-bits with the above reinterpreting results
>   ByteVector cv = (ByteVector)bv.reinterpretShape(SPECIES_128, 0);
>   cv.intoArray(c, 0)
> This case:
>   1. Reinterprets vector `"av"` from 128-bits to 64-bits. It should only copy the first 8 elements to vector `"bv"` and discard the higher half parts.
>   2. Reinterprets vector `"bv"` back to 128-bits. It copies the 64-bits from` "bv"` to` "cv"`, and paddings the rest part of `"cv"` with zero.
>   The final values in array `"c"` are expected to be:
>   c = [ 0, 1, 2, 3, 4, 5, 6, 7, 0,  0, 0, 0, 0, 0, 0, 0]
> However, with the optimization mentioned at the beginning, the second reinterpretation is optimized out. The values in array` "c" `are incorrectly copied from array `"a"`. The values are:
>    c = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
> To fix it, this patch adds the vector size constraint for the optimization, that is the first reinterpretation must not be conducted to a shorter vector.
> 
> It also fixes a potential issue for the implementation of match rule `"reinterpretX2D (from 16 bytes to 8 bytes)" `on Arm NEON. Specifically, the `"mov"` is always needed even if the` "dst"` and `"src"` are the same register since truncation should be conducted in order to be consistent with the semantics.

Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:

  Revert Arm NEON match rule fixes

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk16/pull/100/files
  - new: https://git.openjdk.java.net/jdk16/pull/100/files/47a4978f..6857bdd1

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk16&pr=100&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk16&pr=100&range=00-01

  Stats: 39 lines in 2 files changed: 1 ins; 15 del; 23 mod
  Patch: https://git.openjdk.java.net/jdk16/pull/100.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/100/head:pull/100

PR: https://git.openjdk.java.net/jdk16/pull/100


More information about the hotspot-compiler-dev mailing list