RFR: 8292675: Add identity transformation for removing redundant AndV/OrV nodes

Mon Sep 5 10:31:19 UTC 2022

Recently we found that the rotate left/right benchmarks with vectorapi
emit a redundant "and" instruction on both aarch64 and x86_64 machines
which can be done away with.  For example - and(and(a, b), b) generates
two "and" instructions which can be reduced to a single "and" operation-
and(a, b) since "and" (and "or") operations are commutative and
idempotent in nature.  This can help improve performance for all those
workloads which have multiple "and"/"or" operations with the same value
by reducing them to fewer "and"/"or" operations accordingly.

This patch adds the following transformations for vector logical
operations - AndV and OrV :

(OpV (OpV a b) b) => (OpV a b)
(OpV (OpV a b) a) => (OpV a b)
(OpV (OpV a b m1) b m1) => (OpV a b m1)
(OpV (OpV a b m1) a m1) => (OpV a b m1)
(OpV a (OpV a b)) => (OpV a b)
(OpV b (OpV a b)) => (OpV a b)
(OpV a (OpV a b m) m) => (OpV a b m)
where Op = "And", "Or"

Links for benchmarks tested are given below :-
https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L728
https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L764
https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L728
https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L764

Before this patch, the disassembly for one these testcases
(IntMaxVector.ROR) for Neon is shown below :
 ```
  ldr     q16, [x12, #16]
  and   v16.16b, v16.16b, v20.16b
  and   v16.16b, v16.16b, v20.16b
  add   x12, x16, x11
  sub   v17.4s, v21.4s, v16.4s
  ...
  ...

After this patch, the disassembly for the same testcase above is shown
below :

  ldr     q16, [x12, #16]
  and   v16.16b, v16.16b, v20.16b
  add   x12, x16, x11
  sub   v17.4s, v21.4s, v16.4s
  ...
  ...

The other tests also emit an extra "and" instruction as shown above for
the vector ROR/ROL operations.

Below are the performance results for the vectorapi rotate tests (tests
given in the links above) with this patch on aarch64 and x86_64 machines
(for int and long types) -

Benchmark                aarch64   x86_64
IntMaxVector.ROL         25.57%    26.09%
IntMaxVector.ROR         23.75%    24.15%
LongMaxVector.ROL        28.91%    28.51%
LongMaxVector.ROR        16.51%    29.11%

The percentage indicates the percent gain/improvement in performance
(ops/ms) with this patch over the master build without this patch. The
machine descriptions are given below -
aarch64 - 128-bit aarch64 machine
x86_64  - 256-bit x86 machine

-------------

Commit messages:
 - 8292675: Add identity transformation for removing redundant AndV/OrV nodes

Changes: https://git.openjdk.org/jdk/pull/10163/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10163&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8292675
  Stats: 287 lines in 2 files changed: 285 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/10163.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/10163/head:pull/10163

PR: https://git.openjdk.org/jdk/pull/10163