RFR: 8292675: Add identity transformation for removing redundant AndV/OrV nodes [v2]

Nick Gasson ngasson at openjdk.org
Fri Sep 9 11:59:29 UTC 2022


On Thu, 8 Sep 2022 16:28:32 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:

>> Recently we found that the rotate left/right benchmarks with vectorapi
>> emit a redundant "and" instruction on both aarch64 and x86_64 machines
>> which can be done away with.  For example - and(and(a, b), b) generates
>> two "and" instructions which can be reduced to a single "and" operation-
>> and(a, b) since "and" (and "or") operations are commutative and
>> idempotent in nature.  This can help improve performance for all those
>> workloads which have multiple "and"/"or" operations with the same value
>> by reducing them to fewer "and"/"or" operations accordingly.
>> 
>> This patch adds the following transformations for vector logical
>> operations - AndV and OrV :
>> 
>> 
>> (OpV (OpV a b) b) => (OpV a b)
>> (OpV (OpV a b) a) => (OpV a b)
>> (OpV (OpV a b m1) b m1) => (OpV a b m1)
>> (OpV (OpV a b m1) a m1) => (OpV a b m1)
>> (OpV a (OpV a b)) => (OpV a b)
>> (OpV b (OpV a b)) => (OpV a b)
>> (OpV a (OpV a b m) m) => (OpV a b m)
>> 
>> where Op = "And", "Or"
>> 
>> Links for benchmarks tested are given below :-
>> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L728
>> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L764
>> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L728
>> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L764
>> 
>> Before this patch, the disassembly for one these testcases
>> (IntMaxVector.ROR) for Neon is shown below :
>>  ```
>>   ldr     q16, [x12, #16]
>>   and     v16.16b, v16.16b, v20.16b
>>   and     v16.16b, v16.16b, v20.16b
>>   add     x12, x16, x11
>>   sub     v17.4s, v21.4s, v16.4s
>>   ...
>>   ...
>> 
>> 
>> After this patch, the disassembly for the same testcase above is shown
>> below :
>> 
>>   ldr     q16, [x12, #16]
>>   and     v16.16b, v16.16b, v20.16b
>>   add     x12, x16, x11
>>   sub     v17.4s, v21.4s, v16.4s
>>   ...
>>   ...
>> 
>> 
>> The other tests also emit an extra "and" instruction as shown above for
>> the vector ROR/ROL operations.
>> 
>> Below are the performance results for the vectorapi rotate tests (tests
>> given in the links above) with this patch on aarch64 and x86_64 machines
>> (for int and long types) -
>> 
>> 
>> Benchmark                aarch64   x86_64
>> IntMaxVector.ROL         25.57%    26.09%
>> IntMaxVector.ROR         23.75%    24.15%
>> LongMaxVector.ROL        28.91%    28.51%
>> LongMaxVector.ROR        16.51%    29.11%
>> 
>> 
>> 
>> The percentage indicates the percent gain/improvement in performance
>> (ops/ms) with this patch over the master build without this patch. The
>> machine descriptions are given below -
>> aarch64 - 128-bit aarch64 machine
>> x86_64  - 256-bit x86 machine
>
> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Merge two if conditions and some trivial changes

Marked as reviewed by ngasson (Reviewer).

-------------

PR: https://git.openjdk.org/jdk/pull/10163


More information about the hotspot-compiler-dev mailing list