From aph-open at littlepinkcloud.com Fri Apr 12 18:10:11 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Fri, 12 Apr 2024 19:10:11 +0100 Subject: [aarch64-port-dev ] Math: optimation for doing remainder In-Reply-To: References: Message-ID: <258412a7-ab7d-4c90-80ad-715b23b32c30@littlepinkcloud.com> On 4/12/24 11:37, Jin Guojie wrote: > According to the technical documentation of Arm N2, MSUB instruction uses the same ALU with SDIV. > After testing, it was found that the combination of MUL/SUB is much faster than MSUB. > Below is a patch I wrote to optimize the opertion of doing remainder. > Testing with actual Java programs shows that the performance of this operation has indeed been significantly improved. Interesting. I wrote a JMH test for this, and on Apple M1 separate MUL/SUB is dramatically worse: Before: Divide.iters 32 avgt 5 650.431 ? 5.890 ns/op Divide.iters 342862386 avgt 5 650.597 ? 4.460 ns/op After: Divide.iters 32 avgt 5 979.338 ? 1.266 ns/op Divide.iters 342862386 avgt 5 978.652 ? 2.005 ns/op ... which is perhaps not surprising. On another Neoverse machine I got a result very similar to yours, about 15% faster with separate MUL/SUB. To be honest with you, I hate very machine-specific performance tweaks. The biggest problem is that testing is different on every kind of machine if they all have machine-specific tweaks. Given that this is a pretty rare case, for integer modulo by a non-constant value, and that the difference is small,do you really need it? I attached a JMH test for more reliable testing. Finally, please send questions to hotspot-dev, with "AArch64" in the title, or I may not see them. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 -------------- next part -------------- A non-text attachment was scrubbed... Name: Divide.java Type: text/x-java Size: 520 bytes Desc: not available URL: From jinguojie.jgj at alibaba-inc.com Thu Apr 18 02:29:40 2024 From: jinguojie.jgj at alibaba-inc.com (Jin Guojie) Date: Thu, 18 Apr 2024 10:29:40 +0800 Subject: [aarch64-port-dev ] =?utf-8?q?Aarch64=3A_CPU=5FModel_support_for?= =?utf-8?q?_Neoverse_N1/N2/V1/V2?= In-Reply-To: <45ADE631-EFCF-4319-94B6-130E324E5907@amazon.co.uk> References: <45ADE631-EFCF-4319-94B6-130E324E5907@amazon.co.uk> Message-ID: <56ae2a53-02d3-4c29-8b13-37172654b5a1.jinguojie.jgj@alibaba-inc.com> Hi Andrew, We wrote a patch to improve the definition of CPU models for Arm Neoverse. Evgeny thinks it?s better to continue the review process. I submitted my OCA application 10 days ago, but it is still under review. Could you please create an issue in the JDK Bug System (JBS), so that I can submit this PR after the OCA is signed? Jin Guojie ?Alibaba?hotspot developer) 2024/4/18 02:32. Astigeevich, Evgeny wrote: > I agree using enums will improve readability. > It's not been done to simplify backporting. > Could you please create a JBS issue and submit a PR? > Evgeny > On 12/04/2024, 09:22, "Jin Guojie" > wrote: > Hi Evgeny, > Thanks for your great work in "8321025: Enable Neoverse N1 optimizations for Neoverse V2?. > I am currently optimizing the Aarch64 branch of hotspot. I found that there are also some constant numbers in this file vm_version_aarch64.cpp. > In order to make the programming style better, wouldn't it be better if we define these constants as macros? > Below is the code patch I wrote. Thank you for your opinion. > > Jin Guojie > > > From 2dd99c9851b0efbb3c9a8bdc95973f4646ad77c2 Mon Sep 17 00:00:00 2001From: Jin Guojie > > Date: Tue, 2 Apr 2024 09:06:04 +0800 > Subject: CPU_Model support for Neoverse N1/N2/V1/V2 > --- > src/hotspot/cpu/aarch64/vm_version_aarch64.cpp | 12 +++--------- > src/hotspot/cpu/aarch64/vm_version_aarch64.hpp | 7 +++++++ > 2 files changed, 10 insertions(+), 9 deletions(-) > diff --git a/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp b/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp > index 18f310c746c..732020a420f 100644 > --- a/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp > @@ -213,12 +213,8 @@ void VM_Version::initialize() { > } > > > // Neoverse > - // N1: 0xd0c > - // N2: 0xd49 > - // V1: 0xd40 > - // V2: 0xd4f > - if (_cpu == CPU_ARM && (model_is(0xd0c) || model_is(0xd49) || > - model_is(0xd40) || model_is(0xd4f))) { > + if (_cpu == CPU_ARM && (model_is(CPU_MODEL_NEOVERSE_N1) || model_is(CPU_MODEL_NEOVERSE_N2) || > + model_is(CPU_MODEL_NEOVERSE_V1) || model_is(CPU_MODEL_NEOVERSE_V2))) { > if (FLAG_IS_DEFAULT(UseSIMDForMemoryOps)) { > FLAG_SET_DEFAULT(UseSIMDForMemoryOps, true); > } > @@ -248,9 +244,7 @@ void VM_Version::initialize() { > } > > // Neoverse > - // V1: 0xd40 > - // V2: 0xd4f > - if (_cpu == CPU_ARM && (model_is(0xd40) || model_is(0xd4f))) { > + if (_cpu == CPU_ARM && (model_is(CPU_MODEL_NEOVERSE_V1) || model_is(CPU_MODEL_NEOVERSE_V2))) { > if (FLAG_IS_DEFAULT(UseCryptoPmullForCRC32)) { > FLAG_SET_DEFAULT(UseCryptoPmullForCRC32, true); > } > diff --git a/src/hotspot/cpu/aarch64/vm_version_aarch64.hpp b/src/hotspot/cpu/aarch64/vm_version_aarch64.hpp > index 6883dc0d93e..a9821ea50c4 100644 > --- a/src/hotspot/cpu/aarch64/vm_version_aarch64.hpp > +++ b/src/hotspot/cpu/aarch64/vm_version_aarch64.hpp > @@ -114,6 +114,13 @@ enum Ampere_CPU_Model { > CPU_MODEL_AMPERE_1B = 0xac5 /* AMPERE_1B core Implements ARMv8.7 with CSSC, MTE, SM3/SM4 extensions */ > }; > > +enum Neoverse_CPU_Model { > + CPU_MODEL_NEOVERSE_N1 = 0xd0c, > + CPU_MODEL_NEOVERSE_N2 = 0xd49, > + CPU_MODEL_NEOVERSE_V1 = 0xd40, > + CPU_MODEL_NEOVERSE_V2 = 0xd4f, > +}; > + > #define CPU_FEATURE_FLAGS(decl) \ > decl(FP, fp, 0) \ > decl(ASIMD, asimd, 1) \ > -- > 2.39.3 Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. From aph-open at littlepinkcloud.com Thu Apr 18 07:20:56 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Thu, 18 Apr 2024 08:20:56 +0100 Subject: [aarch64-port-dev ] Aarch64: CPU_Model support for Neoverse N1/N2/V1/V2 In-Reply-To: <56ae2a53-02d3-4c29-8b13-37172654b5a1.jinguojie.jgj@alibaba-inc.com> References: <45ADE631-EFCF-4319-94B6-130E324E5907@amazon.co.uk> <56ae2a53-02d3-4c29-8b13-37172654b5a1.jinguojie.jgj@alibaba-inc.com> Message-ID: On 4/18/24 03:29, Jin Guojie wrote: > We wrote a patch to improve the definition of CPU models for Arm Neoverse. > Evgeny thinks it?s better to continue the review process. Sure. My immediate reaction is that having separate categories for the Neoverse CPUs is getting to be rather cumbersome. Clearly they have a lot in common, and it would be nicer to be able to say things like "if CPU is Arm.Neoverse" or "is Arm.Neoverse.V2" but right now I can't think of a nice way to do that. Maybe a nested class hierarchy? > I submitted my OCA application 10 days ago, but it is still under review. > Could you please create an issue in the JDK Bug System (JBS), > so that I can submit this PR after the OCA is signed? I will, but let's have some ideas about what the result should be. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From jinguojie.jgj at alibaba-inc.com Tue Apr 30 03:24:03 2024 From: jinguojie.jgj at alibaba-inc.com (Jin Guojie) Date: Tue, 30 Apr 2024 11:24:03 +0800 Subject: [aarch64-port-dev ] =?utf-8?b?UmVwbHnvvJpBYXJjaDY0OiBDUFVfTW9k?= =?utf-8?q?el_support_for_Neoverse_N1/N2/V1/V2?= In-Reply-To: References: Message-ID: <18565b39-db0a-4d49-a5e5-fa52c5fa8e85.jinguojie.jgj@alibaba-inc.com> Hi Andrew, On 2024/4/18 Andrew Haley wrote: > On 4/18/24 03:29, Jin Guojie wrote: >> We wrote a patch to improve the definition of CPU models for Arm Neoverse. > Sure. My immediate reaction is that having separate categories for the Neoverse > CPUs is getting to be rather cumbersome. Clearly they have a lot in common, > and it would be nicer to be able to say things like > ? "if CPU is Arm.Neoverse" or "is Arm.Neoverse.V2" > but right now I can't think of a nice way to do that. Maybe a nested class hierarchy? >> Could you please create an issue in the JDK Bug System (JBS), > I will, but let's have some ideas about what the result should be. We have re-optimized the code style of the Neoverse CPU model definition. To achieve higher compiler compatibility, we used simple judgment logic in vm_version_aarch64.hpp. We also analyzed vm_version_x86.hpp and did not find the "nested class hierarchy" syntax you mentioned. The way X86 uses to determine the CPU type is to define a set of is_xxx() functions, just like the style we use in the patch below. The main program (vm_version_aarch64.cpp) looks more concise now. Looking forward to your suggestions. Thanks. -- Jin Guojie(Alibaba, hotspot developer) diff --git a/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp b/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp index 18f310c746c..5fc2b5cee2d 100644 --- a/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp @@ -212,13 +212,7 @@ void VM_Version::initialize() { } } - // Neoverse - // N1: 0xd0c - // N2: 0xd49 - // V1: 0xd40 - // V2: 0xd4f - if (_cpu == CPU_ARM && (model_is(0xd0c) || model_is(0xd49) || - model_is(0xd40) || model_is(0xd4f))) { + if (is_neoverse_family()) { if (FLAG_IS_DEFAULT(UseSIMDForMemoryOps)) { FLAG_SET_DEFAULT(UseSIMDForMemoryOps, true); } @@ -247,10 +241,7 @@ void VM_Version::initialize() { FLAG_SET_DEFAULT(UseCRC32, false); } - // Neoverse - // V1: 0xd40 - // V2: 0xd4f - if (_cpu == CPU_ARM && (model_is(0xd40) || model_is(0xd4f))) { + if (is_neoverse_v_series()) { if (FLAG_IS_DEFAULT(UseCryptoPmullForCRC32)) { FLAG_SET_DEFAULT(UseCryptoPmullForCRC32, true); } diff --git a/src/hotspot/cpu/aarch64/vm_version_aarch64.hpp b/src/hotspot/cpu/aarch64/vm_version_aarch64.hpp index f6cac72804f..323b7e8e151 100644 --- a/src/hotspot/cpu/aarch64/vm_version_aarch64.hpp +++ b/src/hotspot/cpu/aarch64/vm_version_aarch64.hpp @@ -114,6 +114,13 @@ enum Ampere_CPU_Model { CPU_MODEL_AMPERE_1B = 0xac5 /* AMPERE_1B core Implements ARMv8.7 with CSSC, MTE, SM3/SM4 extensions */ }; +enum Neoverse_CPU_Model { + CPU_MODEL_NEOVERSE_N1 = 0xd0c, + CPU_MODEL_NEOVERSE_N2 = 0xd49, + CPU_MODEL_NEOVERSE_V1 = 0xd40, + CPU_MODEL_NEOVERSE_V2 = 0xd4f, +}; + #define CPU_FEATURE_FLAGS(decl) \ decl(FP, fp, 0) \ decl(ASIMD, asimd, 1) \ @@ -156,6 +163,22 @@ enum Ampere_CPU_Model { return _model == cpu_model || _model2 == cpu_model; } + static bool is_neoverse_family() { + return _cpu == CPU_ARM + && (model_is(CPU_MODEL_NEOVERSE_N1) || model_is(CPU_MODEL_NEOVERSE_N2) || + model_is(CPU_MODEL_NEOVERSE_V1) || model_is(CPU_MODEL_NEOVERSE_V2)); + } + + static bool is_neoverse_n_series() { + return is_neoverse_family() && + (model_is(CPU_MODEL_NEOVERSE_N1) || model_is(CPU_MODEL_NEOVERSE_N2)); + } + + static bool is_neoverse_v_series() { + return is_neoverse_family() && + (model_is(CPU_MODEL_NEOVERSE_V1) || model_is(CPU_MODEL_NEOVERSE_V2)); + } + static bool is_zva_enabled() { return 0 <= _zva_length; } static int zva_length() { assert(is_zva_enabled(), "ZVA not available");