[jdk17u-dev] RFR: 8265263: AArch64: Combine vneg with right shift count [v2]

Thu Oct 20 12:46:08 UTC 2022

On Thu, 20 Oct 2022 11:55:24 GMT, Dmitry Chuyko <dchuyko at openjdk.org> wrote:

>> This is a performance improvement for AArch64. There are several differences from the original change.
>> 
>> https://bugs.openjdk.org/browse/JDK-8267356 (Vector API SVE codegen support) is not in 17u, so `UseSVE == 0` parts in predicates are missing/excluded.
>> 
>> https://bugs.openjdk.org/browse/JDK-8288445 (C2 compilation fails) is a subsequent bugfix already backported in 17u, so some `immI` arguments in rules became `immI_positive`.
>> 
>> https://bugs.openjdk.org/browse/JDK-8277239 (SIGSEGV in vrshift_reg_maskedNode::emit) is also related to Vector API and is not in 17u, so `!n->as_ShiftV()->is_var_shift()` is replaced by `VectorNode::is_vshift_cnt(n->in(2))`. This substitution may raise doubts.
>> 
>> Testing: jtreg test/hotspot/jtreg/compiler, tier1, tier2 on aarch64.
>> 
>> Performance improvements in the added benchmark VectorShiftRight on Graviton 2 for default size=1024 correspond to the original review:
>> 
>> 
>> rShiftByte      16%
>> rShiftInt       27%
>> rShiftLong      16%
>> rShiftShort     20%
>> urShiftByte     0%
>> urShiftChar     20%
>> urShiftInt      27%
>> urShiftLong     16%
>
> Dmitry Chuyko has updated the pull request incrementally with one additional commit since the last revision:
> 
>   No SVE checks in vsrcnt8B, vsrcnt16B

I noticed that there exists difference between the generated AD file from M4 file and the provided AD file.
I think we should eliminate the difference.


~/jdk//src/hotspot/cpu/aarch64$ m4 aarch64_neon_ad.m4 > aarch64_neon.ad
~/jdk/src/hotspot/cpu/aarch64$ git diff
 src/hotspot/cpu/aarch64/aarch64_neon.ad | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/hotspot/cpu/aarch64/aarch64_neon.ad b/src/hotspot/cpu/aarch64/aarch64_neon.ad
index db50e08fffd..d43c8d31b78 100644
--- a/src/hotspot/cpu/aarch64/aarch64_neon.ad
+++ b/src/hotspot/cpu/aarch64/aarch64_neon.ad
@@ -4250,8 +4250,8 @@ instruct vxor16B(vecX dst, vecX src1, vecX src2)
 //         on vsra8B rule for more details.

 instruct vslcnt8B(vecD dst, iRegIorL2I cnt) %{
-  predicate(n->as_Vector()->length_in_bytes() == 4 ||
-                            n->as_Vector()->length_in_bytes() == 8);
+  predicate((n->as_Vector()->length_in_bytes() == 4 ||
+              n->as_Vector()->length_in_bytes() == 8));
   match(Set dst (LShiftCntV cnt));
   ins_cost(INSN_COST);
   format %{ "dup  $dst, $cnt\t# shift count vector (8B)" %}
@@ -4273,8 +4273,8 @@ instruct vslcnt16B(vecX dst, iRegIorL2I cnt) %{
 %}

 instruct vsrcnt8B(vecD dst, iRegIorL2I cnt) %{
-  predicate(n->as_Vector()->length_in_bytes() == 4 ||
-            n->as_Vector()->length_in_bytes() == 8);
+  predicate((n->as_Vector()->length_in_bytes() == 4 ||
+             n->as_Vector()->length_in_bytes() == 8));
   match(Set dst (RShiftCntV cnt));
   ins_cost(INSN_COST * 2);
   format %{ "negw  rscratch1, $cnt\t"

-------------

PR: https://git.openjdk.org/jdk17u-dev/pull/811