RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on some hardware
Koutheir Attouchi
duke at openjdk.org
Tue Apr 8 17:12:25 UTC 2025
On Mon, 7 Apr 2025 12:39:40 GMT, Aleksei Voitylov <avoitylov at openjdk.org> wrote:
> The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic.
>
> Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53.
>
> This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark.
I'm saying that a simple HelloWorld reproduces the issue every time on the Rock64.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2787132353
More information about the hotspot-compiler-dev
mailing list