[aarch64-port-dev ] Math: optimation for doing remainder
Andrew Haley
aph-open at littlepinkcloud.com
Fri Apr 12 18:10:11 UTC 2024
On 4/12/24 11:37, Jin Guojie wrote:
> According to the technical documentation of Arm N2, MSUB instruction uses the same ALU with SDIV.
> After testing, it was found that the combination of MUL/SUB is much faster than MSUB.
> Below is a patch I wrote to optimize the opertion of doing remainder.
> Testing with actual Java programs shows that the performance of this operation has indeed been significantly improved.
Interesting. I wrote a JMH test for this, and on Apple M1 separate MUL/SUB
is dramatically worse:
Before:
Divide.iters 32 avgt 5 650.431 ± 5.890 ns/op
Divide.iters 342862386 avgt 5 650.597 ± 4.460 ns/op
After:
Divide.iters 32 avgt 5 979.338 ± 1.266 ns/op
Divide.iters 342862386 avgt 5 978.652 ± 2.005 ns/op
... which is perhaps not surprising. On another Neoverse machine I got
a result very similar to yours, about 15% faster with separate MUL/SUB.
To be honest with you, I hate very machine-specific performance tweaks.
The biggest problem is that testing is different on every kind of machine
if they all have machine-specific tweaks.
Given that this is a pretty rare case, for integer modulo by a non-constant
value, and that the difference is small,do you really need it? I attached a
JMH test for more reliable testing.
Finally, please send questions to hotspot-dev, with "AArch64" in the title,
or I may not see them.
--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Divide.java
Type: text/x-java
Size: 520 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/aarch64-port-dev/attachments/20240412/fcbcad5b/Divide.java>
More information about the aarch64-port-dev
mailing list