RFR: 8289422: Fix and re-enable vector conditional move [v3]

Fri Aug 12 06:39:06 UTC 2022

> // float[] a, float[] b, float[] c;
> for (int i = 0; i < a.length; i++) {
>     c[i] = (a[i] > b[i]) ? a[i] : b[i];
> }
> 
> 
> After [JDK-8139340](https://bugs.openjdk.org/browse/JDK-8139340) and [JDK-8192846](https://bugs.openjdk.org/browse/JDK-8192846), we hope to vectorize the case
> above by enabling -XX:+UseCMoveUnconditionally and -XX:+UseVectorCmov.
> But the transformation here[1] is going to optimize the BoolNode
> with constant input to a constant and break the design logic of
> cmove vector node[2]. We can't prevent all GVN transformation to
> the BoolNode before matcher, so the patch keeps the condition input
> as a constant while creating a cmove vector node, and then
> restructures it into a binary tree before matching.
> 
> When the input order of original cmp node is different from the
> input order of original cmove node, like:
> 
> // float[] a, float[] b, float[] c;
> for (int i = 0; i < a.length; i++) {
>     c[i] = (a[i] < b[i]) ? a[i] : b[i];
> }
> 
> the patch negates the mask of the BoolNode before creating the
> cmove vector node in SuperWord::output().
> 
> We can also use VectorNode::implemented() to consult if vector
> conditional move is supported in the backend. So, the patch cleans
> the related code in SuperWord::implemented().
> 
> With the patch, the performance uplift is:
> (The micro-benchmark functions are included in the file
> test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java)
> 
> AArch64:
> Benchmark (length)  Mode  Cnt   uplift(ns/op)
> cmoveD     523      avgt  15    68.89%
> cmoveF     523      avgt  15    72.40%
> 
> X86:
> Benchmark (length)  Mode  Cnt   uplift(ns/op)
> cmoveD     523      avgt  15    73.12%
> cmoveF     523      avgt  15    85.45%
> 
> [1]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/subnode.cpp#L1310
> [2]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/matcher.cpp#L2365

Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:

 - Merge branch 'master' into fg8289422

   Change-Id: I870c7bbc73d12bac16756226125edc1a229ba412
 - Enable the test only on aarch64 platform because X86 supports vector cmove only on some 256-bits AVXs

   Change-Id: I64dd49380fe3d303ef6be21460df3be31c1458f8
 - Merge branch 'master' into fg8289422

   Change-Id: I7936552df6ac12949ed8b550576f4e3520596423
 - 8289422: Fix and re-enable vector conditional move

   ```
   // float[] a, float[] b, float[] c;
   for (int i = 0; i < a.length; i++) {
       c[i] = (a[i] > b[i]) ? a[i] : b[i];
   }
   ```

   After JDK-8139340 and JDK-8192846, we hope to vectorize the case
   above by enabling -XX:+UseCMoveUnconditionally and -XX:+UseVectorCmov.
   But the transformation here[1] is going to optimize the BoolNode
   with constant input to a constant and break the design logic of
   cmove vector node[2]. We can't prevent all GVN transformation to
   the BoolNode before matcher, so the patch keeps the condition input
   as a constant while creating a cmove vector node, and then
   restructures it into a binary tree before matching.

   When the input order of original cmp node is different from the
   input order of original cmove node, like:
   ```
   // float[] a, float[] b, float[] c;
   for (int i = 0; i < a.length; i++) {
       c[i] = (a[i] < b[i]) ? a[i] : b[i];
   }
   ```
   the patch negates the mask of the BoolNode before creating the
   cmove vector node in SuperWord::output().

   We can also use VectorNode::implemented() to consult if vector
   conditional move is supported in the backend. So, the patch cleans
   the related code in SuperWord::implemented().

   With the patch, the performance uplift is:
   (The micro-benchmark functions are included in the file
   test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java)

   AArch64:
   Benchmark (length)  Mode  Cnt   uplift(ns/op)
   cmoveD     523      avgt  15    68.89%
   cmoveF     523      avgt  15    72.40%

   X86:
   Benchmark (length)  Mode  Cnt   uplift(ns/op)
   cmoveD     523      avgt  15    73.12%
   cmoveF     523      avgt  15    85.45%

   [1]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/subnode.cpp#L1310
   [2]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/matcher.cpp#L2365

   Change-Id: If046dd745024deb0e602bf7efc2a07c22b89c690

-------------

Changes: https://git.openjdk.org/jdk/pull/9652/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9652&range=02
  Stats: 290 lines in 9 files changed: 275 ins; 7 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/9652.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/9652/head:pull/9652

PR: https://git.openjdk.org/jdk/pull/9652