RFR: 8343689: AArch64: Optimize MulReduction implementation [v8]

Fri Aug 8 14:51:20 UTC 2025

On Fri, 11 Jul 2025 12:23:58 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> I see. Thanks for your explanation. 
>> Current version is okay to me. Perhaps we may want to add more comments here.
>> 
>> Suggestion:
>> 
>> // Note: vsrc and vtmp2 may match when this function is invoked by `reduce_mul_integral_gt128b()`
>> // as a tail call and vsrc holds the intermediate results.
>
>> I see. Thanks for your explanation. Current version is okay to me. Perhaps we may want to add more comments here.
> 
> The current code is just the sort of trap for the maintainer that leads to hard-to-find bugs. It'd be much better to remove the need for this comment by forcing everyone to provide two distinct scratch registers.

@theRealAph , fixed, the implementation doesn't try to do anything smart anymore. It ensures that [all registers](https://github.com/openjdk/jdk/pull/23181/files#diff-75bfb44278df267ce4978393b9b6b6030a7e23065ca15436fb1a5009debc6e81R2002) [are different](https://github.com/openjdk/jdk/pull/23181/files#diff-75bfb44278df267ce4978393b9b6b6030a7e23065ca15436fb1a5009debc6e81R2091) for all supported integer types [but `T_LONG`](https://github.com/openjdk/jdk/pull/23181/files#diff-75bfb44278df267ce4978393b9b6b6030a7e23065ca15436fb1a5009debc6e81R2089) which is a special case. We [pass](https://github.com/openjdk/jdk/pull/23181/files#diff-edf6d70f65d81dc12a483088e0610f4e059bd40697f242aedfed5c2da7475f1aR3519) a couple of `fnoreg`s for `T_LONG` as the implementation for this type requires less temporary vregs.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2263191527