RFR: 8355094: Performance drop in auto-vectorized kernel due to split store [v2]
Xiaohong Gong
xgong at openjdk.org
Thu May 22 05:48:57 UTC 2025
On Mon, 19 May 2025 09:23:28 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:
>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Update src/hotspot/share/opto/superword.cpp
>>
>> Co-authored-by: Manuel Hässig <manuel at haessig.org>
>
>> Impressive analysis, Emanuel! Very deep, thorough, and insightful.
>
> +1 to this. Great work, Emanuel! The fix looks good to me.
> @TobiHartmann Thank you for the review :)
>
> @theRealAph @XiaohongGong Do you have any idea about the somewhat confusing behavior of aarch64 in these benchmarks?
Hi @eme64 , to be honest, I'm not quite sure about the unaligned memory access behavior on AArch64. I tried to make it clear by reading some ARM docs. But unfortunately, the message that I got most is it's HW implementation defined behavior. Some AArch64 micro-architectures prefer aligning memory for loads instead of stores to obtain better performance, but others maybe on the contrary. That's the reality.
My colleague provided to me several patches in go project which also use an option to prefer load alignment or store for a memory move library optimization [1][2][3] on AArch64. Different AArch64 micro-architecture can choose the optimal alignment solution based on the performance results. And it chooses to align loads for Neoverse CPUs by default. Hope this could help you. I think the basic ideal is align with what you did in this PR. Thanks!
[1] https://go-review.googlesource.com/c/go/+/243357
[2] https://github.com/golang/go/blob/7f806c1052aa919c1c195a5b2223626beab2495c/src/runtime/cpuflags_arm64.go#L11
[3] https://go-review.googlesource.com/c/go/+/664038
-------------
PR Comment: https://git.openjdk.org/jdk/pull/25065#issuecomment-2899989790
More information about the hotspot-compiler-dev
mailing list