RFR: 8343689: AArch64: Optimize MulReduction implementation [v4]
Xiaohong Gong
xgong at openjdk.org
Wed Jul 2 01:45:46 UTC 2025
On Tue, 1 Jul 2025 16:07:59 GMT, Mikhail Ablakatov <mablakatov at openjdk.org> wrote:
>> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2097:
>>
>>> 2095: sve_movprfx(vtmp1, vsrc); // copy
>>> 2096: sve_ext(vtmp1, vtmp1, vector_length_in_bytes / 2); // swap halves
>>> 2097: sve_mul(vtmp1, elemType_to_regVariant(bt), pgtmp, vsrc); // multiply halves
>>
>>> sve_mul(vtmp1, elemType_to_regVariant(bt), pgtmp, vsrc);
>>
>> Can we use `ptrue` instread of `pgtmp` here? The higher bits can be computed, but they have not influences to the final results, right?
>
> Thanks! For some reason I thought that we don't have a dedicated predicate register for that.
We can directly use `ptrue` here which maps to `p7` and has been preserved and initialized as all true.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2178816427
More information about the hotspot-compiler-dev
mailing list