RFR: 8315612: RISC-V: intrinsic for unsignedMultiplyHigh

Vladimir Kempik vkempik at openjdk.org
Mon Sep 4 12:28:40 UTC 2023


On Mon, 4 Sep 2023 09:54:55 GMT, Vladimir Kempik <vkempik at openjdk.org> wrote:

> Hello
> Please review this simple patch, it add c2 implementation of intrinsic for unsignedMultiplyHigh.
> The generated code changes from:
> 
>             0x0000003fbcfb12f8:   mulh	t4,t2,t3
>    2.99%    0x0000003fbcfb12fc:   srai	t5,t2,0x3f
>             0x0000003fbcfb1300:   and	t5,t5,t3
>             0x0000003fbcfb1304:   srai	t3,t3,0x3f
>             0x0000003fbcfb1308:   and	t2,t3,t2
>             0x0000003fbcfb130c:   add	t4,t4,t5
>             0x0000003fbcfb130e:   add	t2,t2,t4                    ;*ladd {reexecute=0 rethrow=0 return_oop=0}
> 
> to
> 
>             0x0000003fdcfb6668:   mulhu	t2,t2,t3                    ;*invokestatic unsignedMultiplyHigh {reexecute=0 rethrow=0 return_oop=0}
> 
> 
> Clear code size reduction and potentially some performance boost.
> on hifive I can see the perf boost:
> 
> before:
> MathBench.unsignedMultiplyHighLongLong       0  thrpt    8  67459.527 ± 10110.941  ops/ms
> 
> after:
> MathBench.unsignedMultiplyHighLongLong       0  thrpt    8  86207.949 ± 8636.131  ops/ms
> 
> However on thead the jmh benchmark unsignedMultiplyHighLongLong  didn't show any difference as the hottest place is the fence ( getfield isDone ):
> 
>             0x0000003fdcfb6660:   ld	t3,64(t4)
>             0x0000003fdcfb6664:   ld	t2,56(t4)
>             0x0000003fdcfb6668:   mulhu	t2,t2,t3                    ;*invokestatic unsignedMultiplyHigh {reexecute=0 rethrow=0 return_oop=0}
>                                                                       ; - org.openjdk.bench.java.lang.MathBench::unsignedMultiplyHighLongLong at 8 (line 545)
>                                                                       ; - org.openjdk.bench.java.lang.jmh_generated.MathBench_unsignedMultiplyHighLongLong_jmhTest::unsignedMultiplyHighLongLong_thrpt_jmhStub at 17 (line 119)
>    3.12%    0x0000003fdcfb666c:   lbu	t3,148(s3)                  ;*invokestatic consumeCompiler {reexecute=0 rethrow=0 return_oop=0}
>                                                                       ; - org.openjdk.jmh.infra.Blackhole::consume at 7 (line 393)
>                                                                       ; - org.openjdk.bench.java.lang.jmh_generated.MathBench_unsignedMultiplyHighLongLong_jmhTest::unsignedMultiplyHighLongLong_thrpt_jmhStub at 20 (line 119)
>             0x0000003fdcfb6670:   fence	ir,iorw                     ;*getfield isDone {reexecute=0 rethrow=0 return_oop=0}
>                                                                       ; - org.openjdk.bench.java.lang.jmh_generated.MathBench_unsignedMultiplyHighLongLo...

When increasing test difficulty from

    public long  unsignedMultiplyHighLongLong() {
        return  Math.unsignedMultiplyHigh(long747, long13);

to

    public long  unsignedMultiplyHighLongLong() {
        return  Math.unsignedMultiplyHigh(long747, long13) + Math.unsignedMultiplyHigh(long13, long747) + Math.unsignedMultiplyHigh(long747, long2);


Then I'm starting to see difference on thead as well:

before the patch:

Benchmark                               (seed)   Mode  Cnt      Score    Error   Units
MathBench.unsignedMultiplyHighLongLong       0  thrpt    8  32848.186 ? 490.924  ops/ms


after the patch

Benchmark                               (seed)   Mode  Cnt      Score    Error   Units
MathBench.unsignedMultiplyHighLongLong       0  thrpt    8  35156.335 ? 47.592  ops/ms

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15558#issuecomment-1705181260


More information about the hotspot-compiler-dev mailing list