RFR: 8331558: AArch64: optimize integer remainder [v2]

Wed May 8 02:26:58 UTC 2024

On Mon, 6 May 2024 10:08:30 GMT, Jin Guojie <duke at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 447:
>> 
>>> 445:   inline void msub(Register Rd, Register Rn, Register Rm, Register Ra) {
>>> 446:     if (VM_Version::supports_a53mac() && Ra != zr)
>>> 447:       nop();
>> 
>> It was in JDK-8079203 [1] for the first time. May I ask what's the specials on a53mac?
>> 
>> [1] https://github.com/openjdk/jdk/commit/a65f9f95894e22ce2fd160024ce46f6aaa6c8bd3
>
> This code entered the JDK in 2015. Frankly, I have no idea why an extra nop is needed on CPUs with the a53mac feature. 
> Perhaps the author of patch a65f9f9589, enevill at openjdk.org, could explain?

> It was in JDK-8079203 [1] for the first time. May I ask what's the specials on a53mac?
> 
> [1] [a65f9f9](https://github.com/openjdk/jdk/commit/a65f9f95894e22ce2fd160024ce46f6aaa6c8bd3)

@e1iu 
The feature is clearly described in this material:

**Cortex-A53 MPCore Product Revision r0 - Software Developers Errata Notice**
https://developer.arm.com/documentation/EPM048406/2000/?lang=en

> 835769: AArch64 multiply-accumulate instruction might produce incorrect result
> 
> Description
> When executing in AArch64 state, some multiply-accumulate instructions which read an accumulator operand from the
> result of an earlier multiply instruction might produce incorrect results.
> 
> Workaround
> The only viable workaround is to avoid any of these code sequences, typically by avoiding the use of multiply-
> accumulate instructions, or by inserting a NOP between any adjacent load/store/prefetch instruction and multiply-
> accumulate instruction with no data dependency between them.
>

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1593300840