RFR: 8315612: RISC-V: intrinsic for unsignedMultiplyHigh [v2]
null
duke at openjdk.org
Fri Nov 3 09:28:14 UTC 2023
On Tue, 5 Sep 2023 07:04:06 GMT, Vladimir Kempik <vkempik at openjdk.org> wrote:
>> Hello
>> Please review this simple patch, it add c2 implementation of intrinsic for unsignedMultiplyHigh.
>> The generated code changes from:
>>
>> 0x0000003fbcfb12f8: mulh t4,t2,t3
>> 2.99% 0x0000003fbcfb12fc: srai t5,t2,0x3f
>> 0x0000003fbcfb1300: and t5,t5,t3
>> 0x0000003fbcfb1304: srai t3,t3,0x3f
>> 0x0000003fbcfb1308: and t2,t3,t2
>> 0x0000003fbcfb130c: add t4,t4,t5
>> 0x0000003fbcfb130e: add t2,t2,t4 ;*ladd {reexecute=0 rethrow=0 return_oop=0}
>>
>> to
>>
>> 0x0000003fdcfb6668: mulhu t2,t2,t3 ;*invokestatic unsignedMultiplyHigh {reexecute=0 rethrow=0 return_oop=0}
>>
>>
>> Clear code size reduction and potentially some performance boost.
>> on hifive I can see the perf boost:
>>
>> before:
>> MathBench.unsignedMultiplyHighLongLong 0 thrpt 8 67459.527 ± 10110.941 ops/ms
>>
>> after:
>> MathBench.unsignedMultiplyHighLongLong 0 thrpt 8 86207.949 ± 8636.131 ops/ms
>>
>> However on thead the jmh benchmark unsignedMultiplyHighLongLong didn't show any difference as the hottest place is the fence ( getfield isDone ):
>>
>> 0x0000003fdcfb6660: ld t3,64(t4)
>> 0x0000003fdcfb6664: ld t2,56(t4)
>> 0x0000003fdcfb6668: mulhu t2,t2,t3 ;*invokestatic unsignedMultiplyHigh {reexecute=0 rethrow=0 return_oop=0}
>> ; - org.openjdk.bench.java.lang.MathBench::unsignedMultiplyHighLongLong at 8 (line 545)
>> ; - org.openjdk.bench.java.lang.jmh_generated.MathBench_unsignedMultiplyHighLongLong_jmhTest::unsignedMultiplyHighLongLong_thrpt_jmhStub at 17 (line 119)
>> 3.12% 0x0000003fdcfb666c: lbu t3,148(s3) ;*invokestatic consumeCompiler {reexecute=0 rethrow=0 return_oop=0}
>> ; - org.openjdk.jmh.infra.Blackhole::consume at 7 (line 393)
>> ; - org.openjdk.bench.java.lang.jmh_generated.MathBench_unsignedMultiplyHighLongLong_jmhTest::unsignedMultiplyHighLongLong_thrpt_jmhStub at 20 (line 119)
>> 0x0000003fdcfb6670: fence ir,iorw ;*getfield isDone {reexecute=0 rethrow=0 return_oop=0}
>> ; - or...
>
> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision:
>
> Fix nits
> Hello Please review this simple patch, it add c2 implementation of intrinsic for unsignedMultiplyHigh. The generated code changes from:
>
> ```
> 0x0000003fbcfb12f8: mulh t4,t2,t3
> 2.99% 0x0000003fbcfb12fc: srai t5,t2,0x3f
> 0x0000003fbcfb1300: and t5,t5,t3
> 0x0000003fbcfb1304: srai t3,t3,0x3f
> 0x0000003fbcfb1308: and t2,t3,t2
> 0x0000003fbcfb130c: add t4,t4,t5
> 0x0000003fbcfb130e: add t2,t2,t4 ;*ladd {reexecute=0 rethrow=0 return_oop=0}
> ```
>
> to
>
> ```
> 0x0000003fdcfb6668: mulhu t2,t2,t3 ;*invokestatic unsignedMultiplyHigh {reexecute=0 rethrow=0 return_oop=0}
> ```
>
> Clear code size reduction and potentially some performance boost. on hifive I can see the perf boost:
>
> ```
> before:
> MathBench.unsignedMultiplyHighLongLong 0 thrpt 8 67459.527 ± 10110.941 ops/ms
>
> after:
> MathBench.unsignedMultiplyHighLongLong 0 thrpt 8 86207.949 ± 8636.131 ops/ms
> ```
>
> However on thead the jmh benchmark unsignedMultiplyHighLongLong didn't show any difference as the hottest place is the fence ( getfield isDone ):
>
> ```
> 0x0000003fdcfb6660: ld t3,64(t4)
> 0x0000003fdcfb6664: ld t2,56(t4)
> 0x0000003fdcfb6668: mulhu t2,t2,t3 ;*invokestatic unsignedMultiplyHigh {reexecute=0 rethrow=0 return_oop=0}
> ; - org.openjdk.bench.java.lang.MathBench::unsignedMultiplyHighLongLong at 8 (line 545)
> ; - org.openjdk.bench.java.lang.jmh_generated.MathBench_unsignedMultiplyHighLongLong_jmhTest::unsignedMultiplyHighLongLong_thrpt_jmhStub at 17 (line 119)
> 3.12% 0x0000003fdcfb666c: lbu t3,148(s3) ;*invokestatic consumeCompiler {reexecute=0 rethrow=0 return_oop=0}
> ; - org.openjdk.jmh.infra.Blackhole::consume at 7 (line 393)
> ; - org.openjdk.bench.java.lang.jmh_generated.MathBench_unsignedMultiplyHighLongLong_jmhTest::unsignedMultiplyHighLongLong_thrpt_jmhStub at 20 (line 119)
> 0x0000003fdcfb6670: fence ir,iorw ;*getfield isDone {reexecute=0 rethrow=0 return_oop=0}
> ; - org.openjdk.bench.java.lang.jmh_generated.MathBench_unsignedMultiplyHighLongLong_jmhTest::unsignedMultiplyHighLongLong_thrpt_jmhStub at 30 (line 121)
> 62.78% 0x0000003fdcfb6674: addi s2,s2,1 ;*ladd {reexecute=0 rethrow=0 return_oop=0}
> ; - org.openjdk.bench.java.lang.jmh_generated.MathBench_unsignedMultiplyHighLongLong_jmhTest::unsignedMultiplyHighLongLong_thrpt_jmhStub at 26 (line 120)
> ```
>
> tier1/tier2 tbd
>
> ### Progress
> * [x] Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjdk.org/bylaws#reviewer))
> * [x] Change must not contain extraneous whitespace
> * [x] Commit message must refer to an issue
>
> ### Issue
> * [JDK-8315612](https://bugs.openjdk.org/browse/JDK-8315612): RISC-V: intrinsic for unsignedMultiplyHigh (**Enhancement** - P4)
>
> ### Reviewers
> * [Fei Yang](https://openjdk.org/census#fyang) (@RealFYang - **Reviewer**) ⚠️ Review applies to [55f34287](https://git.openjdk.org/jdk/pull/15558/files/55f34287facffedc6573b9273b6c06fb9400a926)
>
> ### Reviewing
> Using `git`
> Using Skara CLI tools
> Using diff file
> ### Webrev
> [Link to Webrev Comment](https://git.openjdk.org/jdk/pull/15558#issuecomment-1704978958)
Hi Vladimir Kempik the hottest place were detected by perfasm ?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/15558#issuecomment-1792110457
More information about the hotspot-compiler-dev
mailing list