RFR: 8355094: Performance drop in auto-vectorized kernel due to split store [v2]

Emanuel Peter epeter at openjdk.org
Thu May 22 06:45:01 UTC 2025


On Thu, 22 May 2025 05:46:02 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>>> Impressive analysis, Emanuel! Very deep, thorough, and insightful.
>> 
>> +1 to this. Great work, Emanuel! The fix looks good to me.
>
>> @TobiHartmann Thank you for the review :)
>> 
>> @theRealAph @XiaohongGong Do you have any idea about the somewhat confusing behavior of aarch64 in these benchmarks?
> 
> Hi @eme64 , to be honest, I'm not quite sure about the unaligned memory access behavior on AArch64. I tried to make it clear by reading some ARM docs. But unfortunately, the message that I got most is it's HW implementation defined behavior. Some AArch64 micro-architectures prefer aligning memory for loads instead of stores to obtain better performance, but others maybe on the contrary. That's the reality. 
> 
> My colleague provided to me several patches in go project which also use an option to prefer load alignment or store for a memory move library optimization [1][2][3] on AArch64. Different AArch64 micro-architecture can choose the optimal alignment solution based on the performance results. And it chooses to align loads for Neoverse CPUs by default. Hope this could help you. I think the basic ideal is align with what you did in this PR. Thanks!
> 
> [1] https://go-review.googlesource.com/c/go/+/243357
> [2] https://github.com/golang/go/blob/7f806c1052aa919c1c195a5b2223626beab2495c/src/runtime/cpuflags_arm64.go#L11
> [3] https://go-review.googlesource.com/c/go/+/664038

@XiaohongGong Thanks a lot for taking the time to respond! That is very fascinating, and reassuring. Seems I'm not the only one seeing these kinds of results :)

I suppose we could add a similar flag, to target the `aarch64` machines where load alignment is preferred. But from what I see the wins would be marginal, and I don't know `aarch64` enough to figure out which implementations would benefit. But if anyone wants to take this on, I'd be happy to review the PR ;)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25065#issuecomment-2900093404


More information about the hotspot-compiler-dev mailing list