RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar

Fei Yang fyang at openjdk.org
Wed Mar 5 02:07:58 UTC 2025


On Fri, 28 Feb 2025 14:34:47 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Hi,
> Can you help to review this patch?
> It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv.
> 
> ## Performance
> 
> data
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
> Benchmark | (vectorDim) | Mode | Cnt | Score - master | Error - master | Score - patch | Error - patch | Units | Improvement (master/patch)
> -- | -- | -- | -- | -- | -- | -- | -- | -- | --
> Float16OperationsBenchmark.absBenchmark | 256 | avgt | 10 | 219.613 | 0.039 | 219.549 | 0.135 | ns/op | 1
> Float16OperationsBenchmark.absBenchmark | 512 | avgt | 10 | 354.956 | 0.109 | 354.987 | 0.178 | ns/op | 1
> Float16OperationsBenchmark.absBenchmark | 1024 | avgt | 10 | 582.314 | 1.596 | 581.873 | 0.084 | ns/op | 1.001
> Float16OperationsBenchmark.absBenchmark | 2048 | avgt | 10 | 1034.657 | 0.259 | 1035.217 | 0.184 | ns/op | 0.999
> Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 11068.314 | 1.819 | 2594 | 0.207 | ns/op | 4.267
> Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 23197.406 | 9.984 | 5167.862 | 0.111 | ns/op | 4.489
> Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 46773.589 | 7.181 | 10015.53 | 0.412 | ns/op | 4.67
> Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 93579.771 | 17.124 | 19986.546 | 0.331 | ns/op | 4.682
> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 256 | avgt | 10 | 5853.504 | 2.676 | 2628.591 | 0.793 | ns/op | 2.227
> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 512 | avgt | 10 | 11657.465 | 0.735 | 5201.419 | 1.149 | ns/op | 2.241
> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 1024 | avgt | 10 | 18156.051 | 6.518 | 10377.444 | 0.206 | ns/op | 1.75
> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 | 2048 | avgt | 10 | 36463.97 | 31.391 | 20008.274 | 3.274 | ns/op | 1.822
> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 256 | avgt | 10 | 52423.99 | 153.302 | 1345.107 | 0.055 | ns/op | 38.974
> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 512 | avgt | 10 | 95697.45 | 423.436 | 2671.999 | 0.456 | ns/op | 35.815
> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 | 1024 | avgt | 10 | 188838.13 | 1248.92 | 5044.997 | 1.468 | ns...

Hi, Thanks for working on this part. Some initial comments after a cursory look.

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2141:

> 2139:   if (ft == FLOAT_TYPE::half_precision) {
> 2140:     assert_cond(UseZfh);
> 2141:   }

Suggestion: `assert_cond((ft != FLOAT_TYPE::half_precision) || UseZfh);`

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 6392:

> 6390:   fmv_h_x(dst, src);
> 6391:   fcvt_s_h(dst, dst);
> 6392:   j(DONE);

It looks to me confusing to have pairs like `float16_to_float` and `float16_to_float_c2`. As there is only one use for `float16_to_float` in file `src/hotspot/cpu/riscv/stubGenerator_riscv.cpp`, I would suggest we inline the code in the callsite. Then we could remove this assembler routine and rename `float16_to_float_c2` to `float16_to_float`. Also when inlining the code in the callsite, we could replace this `j(DONE)` with a direct return, thus saving one jump instruction.

-------------

PR Review: https://git.openjdk.org/jdk/pull/23844#pullrequestreview-2659617194
PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1980550524
PR Review Comment: https://git.openjdk.org/jdk/pull/23844#discussion_r1980537352


More information about the hotspot-dev mailing list