[10] RFR(S): 8187684 - Intrinsify Math.multiplyHigh(long, long)

Mon Sep 25 15:44:06 UTC 2017

On 25.09.2017 18:09, Andrew Haley wrote:
> On 20/09/17 14:08, Dmitrij Pochepko wrote:
>> I've created a small JMH benchmark:
>> http://cr.openjdk.java.net/~dpochepk/8187684/MultiplyHighBench.java to
>> test the improved performance and measured it on aarch64(t88, R-Pi) and
>> x86_64(i7-4770K). Benchmark shows about x2.5 improvement on aarch64 and
>> about x2 on x86_64
> By the way, this benchmark:
>
>          for (int i = 0; i < 100; i++) {
>              op1 = Math.multiplyHigh(op1, op2++);
>          }
>          return Math.multiplyHigh(op1, op2);
>
> measures the latency of the multiplyHigh, not the throughput, because
> each iteration depends on the previous one.  I don't know if that was
> your intent, but I would imagine we're more interested in throughput.
> Fast processors can issue a mulh every few clock cycles, but their
> latency may considerably longer.
>
You're right. I've changed benchmark to:

         long op = System.currentTimeMillis();
         long accum = 0;
         for (int i = 0; i < 10000; i++) {
             accum += Math.multiplyHigh(op + i, op + i);
         }
         return accum;

and it shows even more improvement. about x3.5 on aarch64.

Thank you for noticing.