RFR: 8292675: Add identity transformation for removing redundant AndV/OrV nodes
Bhavana Kilambi
bkilambi at openjdk.org
Mon Sep 5 10:31:19 UTC 2022
Recently we found that the rotate left/right benchmarks with vectorapi
emit a redundant "and" instruction on both aarch64 and x86_64 machines
which can be done away with. For example - and(and(a, b), b) generates
two "and" instructions which can be reduced to a single "and" operation-
and(a, b) since "and" (and "or") operations are commutative and
idempotent in nature. This can help improve performance for all those
workloads which have multiple "and"/"or" operations with the same value
by reducing them to fewer "and"/"or" operations accordingly.
This patch adds the following transformations for vector logical
operations - AndV and OrV :
(OpV (OpV a b) b) => (OpV a b)
(OpV (OpV a b) a) => (OpV a b)
(OpV (OpV a b m1) b m1) => (OpV a b m1)
(OpV (OpV a b m1) a m1) => (OpV a b m1)
(OpV a (OpV a b)) => (OpV a b)
(OpV b (OpV a b)) => (OpV a b)
(OpV a (OpV a b m) m) => (OpV a b m)
where Op = "And", "Or"
Links for benchmarks tested are given below :-
https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L728
https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L764
https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L728
https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L764
Before this patch, the disassembly for one these testcases
(IntMaxVector.ROR) for Neon is shown below :
```
ldr q16, [x12, #16]
and v16.16b, v16.16b, v20.16b
and v16.16b, v16.16b, v20.16b
add x12, x16, x11
sub v17.4s, v21.4s, v16.4s
...
...
After this patch, the disassembly for the same testcase above is shown
below :
ldr q16, [x12, #16]
and v16.16b, v16.16b, v20.16b
add x12, x16, x11
sub v17.4s, v21.4s, v16.4s
...
...
The other tests also emit an extra "and" instruction as shown above for
the vector ROR/ROL operations.
Below are the performance results for the vectorapi rotate tests (tests
given in the links above) with this patch on aarch64 and x86_64 machines
(for int and long types) -
Benchmark aarch64 x86_64
IntMaxVector.ROL 25.57% 26.09%
IntMaxVector.ROR 23.75% 24.15%
LongMaxVector.ROL 28.91% 28.51%
LongMaxVector.ROR 16.51% 29.11%
The percentage indicates the percent gain/improvement in performance
(ops/ms) with this patch over the master build without this patch. The
machine descriptions are given below -
aarch64 - 128-bit aarch64 machine
x86_64 - 256-bit x86 machine
-------------
Commit messages:
- 8292675: Add identity transformation for removing redundant AndV/OrV nodes
Changes: https://git.openjdk.org/jdk/pull/10163/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10163&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8292675
Stats: 287 lines in 2 files changed: 285 ins; 0 del; 2 mod
Patch: https://git.openjdk.org/jdk/pull/10163.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/10163/head:pull/10163
PR: https://git.openjdk.org/jdk/pull/10163
More information about the hotspot-compiler-dev
mailing list