RFR: 8331558: AArch64: optimize integer remainder [v2]

Bhavana Kilambi bkilambi at openjdk.org
Tue May 7 14:03:56 UTC 2024


On Mon, 6 May 2024 05:50:13 GMT, Jin Guojie <duke at openjdk.org> wrote:

>> 8331558: AArch64: optimize integer remainder
>>     On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction.
>> 
>> 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2
>>     Add full platform coverage for Neoverse variants in vm_version.?pp
>> 
>> The following test has passed, which shows definite performance improvement.
>> 
>> make test TEST="micro:java.lang.IntegerDivMod"
>> make test TEST="micro:java.lang.LongDivMod"
>> 
>> * IntegerDivMod.testDivideRemainderUnsigned
>> baseline(ns/ops)                2223
>> with this pacth(ns/ops)         1885    
>> improvement(%)                  17.93%
>> 
>> * IntegerDivMod.testRemainderUnsigned
>> baseline(ns/ops)                2225
>> with this pacth(ns/ops)         1885    
>> improvement(%)                  18.03%
>> 
>> * LongDivMod.testDivideRemainderUnsigned
>> baseline(ns/ops)                2231
>> with this pacth(ns/ops)         1894    
>> improvement(%)                  17.79%
>> 
>> * LongDivMod.testRemainderUnsigned
>> baseline(ns/ops)                2232
>> with this pacth(ns/ops)         1891
>> improvement(%)                  18.03%
>
> Jin Guojie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - Merge branch 'openjdk:master' into dev
>  - Update vm_version_aarch64.hpp
>  - 8331558: AArch64: optimize integer remainder
>    
>    On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction.
>  - 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2
>    
>    Add full platform coverage for Neoverse variants in vm_version.?pp

src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 462:

> 460:     if (VM_Version::supports_a53mac() && Ra != zr)
> 461:       nop();
> 462:     if (VM_Version::is_neoverse_n_series()) {

Why only Neoverse N series? Even on the V series (V1 and V2), both `sdiv/udiv` and `msub` instructions are executed in M0 unit (Integer multi cycle). It should benefit the V series as well. Source: https://developer.arm.com/documentation/pjdoc466751330-9685/latest/ and https://developer.arm.com/documentation/PJDOC-466751330-593177/latest/

A quick run on a V1 machine shows ~15% performance gain for the `IntegerDivMod` tests if we generate separate `mul` and `sub` instructions instead of a single `msub`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1592539756


More information about the hotspot-dev mailing list