RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53

Tue Apr 15 14:52:57 UTC 2025

On Tue, 15 Apr 2025 14:23:08 GMT, Stuart Monteith <smonteith at openjdk.org> wrote:

>> The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic.
>>  
>> Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53.
>>  
>> This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). After the fix, the specialized test hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java passes.
>> 
>> The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark.
>
> You can check on the Arm website for specific CPU errata, but  I will find out about any serious errata when they are discovered. The category of errata found on the A53 don't really occur or are mitigated with different means, so later generations haven't required mitigations in software.

@stooart-mon I did and must admit it was a painful experience. The conclusion I came to is that we need an Arm errata expert to look into OpenJDK codegen. I see that Arm carefully maintains GCC code base for that purpose, and are probably much better experts in this area. Patching there happens after checking if the conditions for the given erratum are met much narrower then what was done as part of, say, JDK-8079203, and much less clumsily. Specific CPU registers can be employed to check the need for patching. 

Developing good analogues to `-mfix-cortex-a57-aes-1742098, -mfix-cortex-a72-aes-1655431, -mfix-cortex-a53-835769, -mfix-cortex-a53-843419` would probably be a good start, but again, the full extent is much larger [1]. 

[1] Some Arm errata:
A53: [https://developer.arm.com/documentation/epm048406/2100/?lang=en](https://documentation-service.arm.com/static/5fa29fddb209f547eebd361d?token=)
A55: https://developer.arm.com/documentation/SDEN859338/1500/?lang=en
A72: https://developer.arm.com/documentation/epm012079/11/?lang=en
A75: https://developer.arm.com/documentation/SDEN859515/i/?lang=en
A76: https://developer.arm.com/documentation/SDEN-885749/3200/?lang=en
N1: https://developer.arm.com/documentation/SDEN885747/latest/
V1: https://developer.arm.com/documentation/SDEN1401781/latest/
V2: https://developer.arm.com/documentation/SDEN2332927/latest/
N2: https://developer.arm.com/documentation/SDEN1982442/latest/

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2805697524