RFR: 8343689: AArch64: Optimize MulReduction implementation [v3]

Mikhail Ablakatov mablakatov at openjdk.org
Mon Jun 30 13:25:10 UTC 2025


On Thu, 27 Feb 2025 03:28:51 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> Mikhail Ablakatov has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - fixup: don't modify the value in vsrc
>>    
>>    Fix reduce_mul_integral_gt128b() so it doesn't modify vsrc. With this
>>    change, the result of recursive folding is held in vtmp1. To be able to
>>    pass this intermediate result to reduce_mul_integral_le128b(), we would
>>    have to use another temporary FloatRegister, as vtmp1 would essentially
>>    act as vsrc. It's possible to get around this however:
>>    reduce_mul_integral_le128b() is modified so it's possible to pass
>>    matching vsrc and vtmp2 arguments. By doing this, we save ourselves a
>>    temporary register in rules that match to reduce_mul_integral_gt128b().
>>  - cleanup: revert an unnecessary change to reduce_mul_fp_le128b() formating
>
> src/hotspot/cpu/aarch64/aarch64_vector.ad line 3012:
> 
>> 3010:                             vReg tmp1, vReg tmp2) %{
>> 3011:   predicate(Matcher::vector_length_in_bytes(n->in(2)) == 8 ||
>> 3012:             Matcher::vector_length_in_bytes(n->in(2)) == 16);
> 
> Suggestion:
> 
>   predicate(Matcher::vector_length_in_bytes(n->in(2)) <= 16);

The patch doesn't add or modifies these lines. I'd prefer to leave the footprint as small as possible and leave it as is..

> `    sve_ext(vtmp1, vtmp2, vector_length_in_bytes / 2);`

This doesn't look right. `vtmp1` is both an src and dest register operand for `EXT` but it contains an undefined value at the first iteration of the loop.

I agree that the implementation of the loop is confusing. I'll rework and (hopefully) simplify it, thank you for pointing this out.

Hi @XiaohongGong , I hope I've been able to simplify the implementation. It doesn't utilize a lambda anymore, but I had to peel the first iteration out of the loop anyway.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2154422774
PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2154415974
PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2174988523


More information about the hotspot-compiler-dev mailing list