On Fri, 2 Jul 2021 13:47:40 GMT, Andrew Haley <aph@openjdk.org> wrote:
You can also do that branchlessly which might prove better
long result = Math.multiplyHigh(x, y); result += (y & (x >> 63)); result += (x & (y >> 63)); return result;
You can also do that branchlessly which might prove better
``` long result = Math.multiplyHigh(x, y); result += (y & (x >> 63)); result += (x & (y >> 63)); return result; ``` I doubt very much that it would be better, because these days branch prediction is excellent, and we also have conditional select instructions. Exposing the condition helps C2 to eliminate it if the range of args is known. The `if` code is easier to understand.
Benchmark results, with one of the operands changing signs every iteration, 1000 iterations:
Benchmark Mode Cnt Score Error Units MulHiTest.mulHiTest1 (aph) avgt 3 1570.587 ± 16.602 ns/op MulHiTest.mulHiTest2 (adinn) avgt 3 2237.637 ± 4.740 ns/op
In any case, note that with this optimization the unsigned mulHi is in the nanosecond range, so Good Enough. IMO.
But weirdly, it's the other way around on AArch64, but there's little in it: Benchmark Mode Cnt Score Error Units MulHiTest.mulHiTest1 avgt 3 1492.108 ± 0.301 ns/op MulHiTest.mulHiTest2 avgt 3 1219.521 ± 1.516 ns/op but this is only in the case where we have unpredictable branches. Go with simple and easy to understand; it doesn't much matter. ------------- PR: https://git.openjdk.java.net/jdk/pull/4644