RFR: 8292675: Add identity transformation for removing redundant AndV/OrV nodes
Tobias Hartmann
thartmann at openjdk.org
Thu Sep 8 08:24:56 UTC 2022
On Mon, 5 Sep 2022 10:21:11 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:
> Recently we found that the rotate left/right benchmarks with vectorapi
> emit a redundant "and" instruction on both aarch64 and x86_64 machines
> which can be done away with. For example - and(and(a, b), b) generates
> two "and" instructions which can be reduced to a single "and" operation-
> and(a, b) since "and" (and "or") operations are commutative and
> idempotent in nature. This can help improve performance for all those
> workloads which have multiple "and"/"or" operations with the same value
> by reducing them to fewer "and"/"or" operations accordingly.
>
> This patch adds the following transformations for vector logical
> operations - AndV and OrV :
>
>
> (OpV (OpV a b) b) => (OpV a b)
> (OpV (OpV a b) a) => (OpV a b)
> (OpV (OpV a b m1) b m1) => (OpV a b m1)
> (OpV (OpV a b m1) a m1) => (OpV a b m1)
> (OpV a (OpV a b)) => (OpV a b)
> (OpV b (OpV a b)) => (OpV a b)
> (OpV a (OpV a b m) m) => (OpV a b m)
>
> where Op = "And", "Or"
>
> Links for benchmarks tested are given below :-
> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L728
> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L764
> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L728
> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L764
>
> Before this patch, the disassembly for one these testcases
> (IntMaxVector.ROR) for Neon is shown below :
> ```
> ldr q16, [x12, #16]
> and v16.16b, v16.16b, v20.16b
> and v16.16b, v16.16b, v20.16b
> add x12, x16, x11
> sub v17.4s, v21.4s, v16.4s
> ...
> ...
>
>
> After this patch, the disassembly for the same testcase above is shown
> below :
>
> ldr q16, [x12, #16]
> and v16.16b, v16.16b, v20.16b
> add x12, x16, x11
> sub v17.4s, v21.4s, v16.4s
> ...
> ...
>
>
> The other tests also emit an extra "and" instruction as shown above for
> the vector ROR/ROL operations.
>
> Below are the performance results for the vectorapi rotate tests (tests
> given in the links above) with this patch on aarch64 and x86_64 machines
> (for int and long types) -
>
>
> Benchmark aarch64 x86_64
> IntMaxVector.ROL 25.57% 26.09%
> IntMaxVector.ROR 23.75% 24.15%
> LongMaxVector.ROL 28.91% 28.51%
> LongMaxVector.ROR 16.51% 29.11%
>
>
>
> The percentage indicates the percent gain/improvement in performance
> (ops/ms) with this patch over the master build without this patch. The
> machine descriptions are given below -
> aarch64 - 128-bit aarch64 machine
> x86_64 - 256-bit x86 machine
Changes requested by thartmann (Reviewer).
src/hotspot/share/opto/vectornode.cpp line 1893:
> 1891: if (((!n->is_predicated_vector() && !n1->is_predicated_vector()) ||
> 1892: (n->is_predicated_vector() && n1->is_predicated_vector() &&
> 1893: n->in(3) == n1->in(3))) && (n->in(2) == n1->in(1) || n->in(2) == n1->in(2))) {
This condition is hard to parse, I would suggest:
Suggestion:
if (((!n->is_predicated_vector() && !n1->is_predicated_vector()) ||
( n->is_predicated_vector() && n1->is_predicated_vector() && n->in(3) == n1->in(3))) &&
(n->in(2) == n1->in(1) || n->in(2) == n1->in(2))) {
src/hotspot/share/opto/vectornode.cpp line 1903:
> 1901: // (OperationV src2 (OperationV src1 src2)) => OperationV(src1, src2)
> 1902: if (!n->is_predicated_vector() && !n2->is_predicated_vector() &&
> 1903: (n->in(1) == n2->in(1) || n->in(1) == n2->in(2))) {
Suggestion:
if (!n->is_predicated_vector() && !n2->is_predicated_vector() &&
(n->in(1) == n2->in(1) || n->in(1) == n2->in(2))) {
src/hotspot/share/opto/vectornode.cpp line 1907:
> 1905: // (OperationV src1 (OperationV src1 src2 m1) m1) => OperationV(src1 src2 m1)
> 1906: } else if (n->is_predicated_vector() && n2->is_predicated_vector() &&
> 1907: n->in(3) == n2->in(3) && n->in(1) == n2->in(1)) {
I think this should be merged into line 1902. Why did you omit the `(OperationV src2 (OperationV src1 src2 m1) m1) => OperationV(src1 src2 m1)` case?
-------------
PR: https://git.openjdk.org/jdk/pull/10163
More information about the hotspot-compiler-dev
mailing list