RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar

Hamlin Li mli at openjdk.org
Wed Mar 5 09:18:54 UTC 2025


On Wed, 5 Mar 2025 02:03:18 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> Hi,
>> Can you help to review this patch?
>> It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv.
>> 
>> ## Performance
>> 
>> data
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (vectorDim) | Mode | Cnt | Score -master | Error | Score - patch | Error | Units | Improvement (master/patch)
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.564 | 0.076 | 219.597 | 0.081 | ns/op | 1
>> Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 358.873 | 0.575 | 355.011 | 0.07 | ns/op | 1.011
>> Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.361 | 0.189 | 581.832 | 0.006 | ns/op | 1.001
>> Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1035.633 | 0.239 | 1034.854 | 0.284 | ns/op | 1.001
>> Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 4951.702 | 0.194 | 2593.835 | 0.066 | ns/op | 1.909
>> Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 9867.909 | 0.314 | 5167.568 | 0.162 | ns/op | 1.91
>> Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 21324.318 | 1.651 | 10016.456 | 1.07 | ns/op | 2.129
>> Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 42618.969 | 3.877 | 19985.662 | 1.233 | ns/op | 2.132
>> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 2811.45 | 0.441 | 2701.419 | 140.699 | ns/op | 1.041
>> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 5568.561 | 0.654 | 5577.598 | 1.123 | ns/op | 0.998
>> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 11109.108 | 1.7 | 11095.644 | 0.644 | ns/op | 1.001
>> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 20017.095 | 0.778 | 21560.165 | 0.515 | ns/op | 0.928
>> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 20864.303 | 23.768 | 1345.192 | 0.274 | ns/op | 15.51
>> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 43596.262 | 102.075 | 2580.035 | 0.397 | ns/op | 16.898
>> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 91565.81...
>
> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2141:
> 
>> 2139:   if (ft == FLOAT_TYPE::half_precision) {
>> 2140:     assert_cond(UseZfh);
>> 2141:   }
> 
> Suggestion: `assert_cond((ft != FLOAT_TYPE::half_precision) || UseZfh);`

OK, will fix it.

> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 6392:
> 
>> 6390:   fmv_h_x(dst, src);
>> 6391:   fcvt_s_h(dst, dst);
>> 6392:   j(DONE);
> 
> It looks to me confusing to have pairs like `float16_to_float` and `float16_to_float_c2`. As there is only one use for `float16_to_float` in file `src/hotspot/cpu/riscv/stubGenerator_riscv.cpp`, I would suggest we inline the code in the callsite. Then we could remove this assembler routine and rename `float16_to_float_c2` to `float16_to_float`. Also when inlining the code in the callsite, we could replace this `j(DONE)` with a direct return, thus saving one jump instruction.

Good suggestion!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1980998175
PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1980998037


More information about the hotspot-dev mailing list