[aarch64-port-dev ] Math: optimation for doing remainder

Andrew Haley aph-open at littlepinkcloud.com
Fri Apr 12 18:10:11 UTC 2024


On 4/12/24 11:37, Jin Guojie wrote:
> According to the technical documentation of Arm N2, MSUB instruction uses the same ALU with SDIV.
> After testing, it was found that the combination of MUL/SUB is much faster than MSUB.
> Below is a patch I wrote to optimize the opertion of doing remainder.
> Testing with actual Java programs shows that the performance of this operation has indeed been significantly improved.

Interesting. I wrote a JMH test for this, and on Apple M1 separate MUL/SUB
is dramatically worse:

Before:

Divide.iters             32  avgt    5  650.431 ± 5.890  ns/op
Divide.iters      342862386  avgt    5  650.597 ± 4.460  ns/op

After:

Divide.iters             32  avgt    5  979.338 ± 1.266  ns/op
Divide.iters      342862386  avgt    5  978.652 ± 2.005  ns/op

... which is perhaps not surprising. On another Neoverse machine I got
a result very similar to yours, about 15% faster with separate MUL/SUB.

To be honest with you, I hate very machine-specific performance tweaks.
The biggest problem is that testing is different on every kind of machine
if they all have machine-specific tweaks.

Given that this is a pretty rare case, for integer modulo by a non-constant
value, and that the difference is small,do you really need it? I attached a
JMH test for more reliable testing.

Finally, please send questions to hotspot-dev, with "AArch64" in the title,
or I may not see them.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Divide.java
Type: text/x-java
Size: 520 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/aarch64-port-dev/attachments/20240412/fcbcad5b/Divide.java>


More information about the aarch64-port-dev mailing list