RFR: 8292675: Add identity transformation for removing redundant AndV/OrV nodes [v2]
Tobias Hartmann
thartmann at openjdk.org
Fri Sep 9 08:03:47 UTC 2022
On Thu, 8 Sep 2022 16:28:32 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:
>> Recently we found that the rotate left/right benchmarks with vectorapi
>> emit a redundant "and" instruction on both aarch64 and x86_64 machines
>> which can be done away with. For example - and(and(a, b), b) generates
>> two "and" instructions which can be reduced to a single "and" operation-
>> and(a, b) since "and" (and "or") operations are commutative and
>> idempotent in nature. This can help improve performance for all those
>> workloads which have multiple "and"/"or" operations with the same value
>> by reducing them to fewer "and"/"or" operations accordingly.
>>
>> This patch adds the following transformations for vector logical
>> operations - AndV and OrV :
>>
>>
>> (OpV (OpV a b) b) => (OpV a b)
>> (OpV (OpV a b) a) => (OpV a b)
>> (OpV (OpV a b m1) b m1) => (OpV a b m1)
>> (OpV (OpV a b m1) a m1) => (OpV a b m1)
>> (OpV a (OpV a b)) => (OpV a b)
>> (OpV b (OpV a b)) => (OpV a b)
>> (OpV a (OpV a b m) m) => (OpV a b m)
>>
>> where Op = "And", "Or"
>>
>> Links for benchmarks tested are given below :-
>> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L728
>> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L764
>> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L728
>> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L764
>>
>> Before this patch, the disassembly for one these testcases
>> (IntMaxVector.ROR) for Neon is shown below :
>> ```
>> ldr q16, [x12, #16]
>> and v16.16b, v16.16b, v20.16b
>> and v16.16b, v16.16b, v20.16b
>> add x12, x16, x11
>> sub v17.4s, v21.4s, v16.4s
>> ...
>> ...
>>
>>
>> After this patch, the disassembly for the same testcase above is shown
>> below :
>>
>> ldr q16, [x12, #16]
>> and v16.16b, v16.16b, v20.16b
>> add x12, x16, x11
>> sub v17.4s, v21.4s, v16.4s
>> ...
>> ...
>>
>>
>> The other tests also emit an extra "and" instruction as shown above for
>> the vector ROR/ROL operations.
>>
>> Below are the performance results for the vectorapi rotate tests (tests
>> given in the links above) with this patch on aarch64 and x86_64 machines
>> (for int and long types) -
>>
>>
>> Benchmark aarch64 x86_64
>> IntMaxVector.ROL 25.57% 26.09%
>> IntMaxVector.ROR 23.75% 24.15%
>> LongMaxVector.ROL 28.91% 28.51%
>> LongMaxVector.ROR 16.51% 29.11%
>>
>>
>>
>> The percentage indicates the percent gain/improvement in performance
>> (ops/ms) with this patch over the master build without this patch. The
>> machine descriptions are given below -
>> aarch64 - 128-bit aarch64 machine
>> x86_64 - 256-bit x86 machine
>
> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision:
>
> Merge two if conditions and some trivial changes
Looks good to me. Testing in our CI passed.
-------------
Marked as reviewed by thartmann (Reviewer).
PR: https://git.openjdk.org/jdk/pull/10163
More information about the hotspot-compiler-dev
mailing list