RFR: 8343689: AArch64: Optimize MulReduction implementation [v4]

Xiaohong Gong xgong at openjdk.org
Wed Jul 2 01:45:46 UTC 2025


On Tue, 1 Jul 2025 16:07:59 GMT, Mikhail Ablakatov <mablakatov at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2097:
>> 
>>> 2095:   sve_movprfx(vtmp1, vsrc);                                // copy
>>> 2096:   sve_ext(vtmp1, vtmp1, vector_length_in_bytes / 2);       // swap halves
>>> 2097:   sve_mul(vtmp1, elemType_to_regVariant(bt), pgtmp, vsrc); // multiply halves
>> 
>>> sve_mul(vtmp1, elemType_to_regVariant(bt), pgtmp, vsrc);
>> 
>> Can we use `ptrue` instread of `pgtmp` here? The higher bits can be computed, but they have not influences to the final results, right?
>
> Thanks! For some reason I thought that we don't have a dedicated predicate register for that.

We can directly use `ptrue` here which maps to `p7` and has been preserved and initialized as all true.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2178816427


More information about the hotspot-compiler-dev mailing list