RFR: 8343689: AArch64: Optimize MulReduction implementation [v4]

Xiaohong Gong xgong at openjdk.org
Fri Jul 4 02:03:43 UTC 2025


On Thu, 3 Jul 2025 10:24:02 GMT, Mikhail Ablakatov <mablakatov at openjdk.org> wrote:

>> We can directly use `ptrue` here which maps to `p7` and has been preserved and initialized as all true.
>
> Done, although this has shifter the performance a bit:
> 
> 
> | Benchmark                | Before (ops/ms) | After (ops/ms) | Diff (%) |
> | ------------------------ | --------------- | -------------- | -------- |
> | ByteMaxVector.MULLanes   | 9883.151        | 9093.557       | -7.99%   |
> | DoubleMaxVector.MULLanes | 2712.674        | 2607.367       | -3.89%   |
> | FloatMaxVector.MULLanes  | 3388.811        | 3291.429       | -2.88%   |
> | IntMaxVector.MULLanes    | 4765.554        | 5031.741       | +5.58%   |
> | LongMaxVector.MULLanes   | 2685.228        | 2896.445       | +7.88%   |
> | ShortMaxVector.MULLanes  | 5128.185        | 5197.656       | +1.35%   |
> 
> 
> On average, the results didn't get worse. I suggest to merge the updated version as is as the shift seem to be related to micro-architectural effects not directly related to this PR and overall the PR still improves the performance by an order of magnitude (please reference https://github.com/openjdk/jdk/pull/23181#issuecomment-3018988067 for performance numbers before the PR) . I intent to closer investigate the reasons behind this later.

I'm fine with the latest version because it saves the mask generation and a predicate temp register. The minor regressions are fine to me. 

BTW, Not sure whether the masked operation with partial lanes is more efficient compared with all lane computations. This maybe the HW micro-architecture implementation related issues. I didn't have an investigation for this before. Additionally, currently all the lanewise operations (e.g. `MulV/AddV/...`) with partial vector size are all implemented with `ptrue`. I agree with keeping it as it is, and taking an investigation for this later. 

Thanks for your updating!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2184132213


More information about the hotspot-compiler-dev mailing list