RFR: 8321003: RISC-V: C2 MulReductionVI [v3]

Tue Feb 18 03:18:11 UTC 2025

On Thu, 13 Feb 2025 09:04:24 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review this patch to implement MulReductionVI/MulReductionVL/MulReductionVF/MulReductionVD?
>> This optimization is mainly for the vector API.
>> On riscv, there is no straightforward instructions to do it, but we can do it with a reduction tree, which could reduce the time complexity to lg(N).
>> 
>> 
>> Thanks
>> 
>> ## Test
>> 
>> ### jtreg
>> test/jdk/jdk/incubator/vector/
>> 
>> ### Performance
>> 
>> run on bananapi
>> 
>> master vs patch
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (size) | Mode | Cnt | Score - master | Error - patch | Score - patch | Error - patch | Units | Improvement
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> ByteMaxVector.MULLanes | 1024 | avgt | 10 | 11170.052 | 499.676 | 1294.424 | 8.346 | ns/op | 88.40%
>> ByteMaxVector.MULMaskedLanes | 1024 | avgt | 10 | 12514.587 | 55.06 | 1413.028 | 0.259 | ns/op | 88.70%
>> DoubleMaxVector.MULLanes | 1024 | avgt | 10 | 57672.51 | 1750.417 | 4775.633 | 1.454 | ns/op | 91.70%
>> DoubleMaxVector.MULMaskedLanes | 1024 | avgt | 10 | 63656.523 | 1063.048 | 5899.259 | 1.692 | ns/op | 90.70%
>> FloatMaxVector.MULLanes | 1024 | avgt | 10 | 30997.218 | 728.73 | 2473.069 | 5.84 | ns/op | 92.00%
>> FloatMaxVector.MULMaskedLanes | 1024 | avgt | 10 | 35515.329 | 227.873 | 3284.17 | 0.608 | ns/op | 90.80%
>> IntMaxVector.MULLanes | 1024 | avgt | 10 | 31130.453 | 878.261 | 3304.118 | 5.96 | ns/op | 89.40%
>> IntMaxVector.MULMaskedLanes | 1024 | avgt | 10 | 36851.976 | 394.001 | 3969.407 | 0.511 | ns/op | 89.20%
>> LongMaxVector.MULLanes | 1024 | avgt | 10 | 58795.752 | 1030.985 | 6883.995 | 3.15 | ns/op | 88.30%
>> LongMaxVector.MULMaskedLanes | 1024 | avgt | 10 | 63642.904 | 386.521 | 7892.735 | 9.359 | ns/op | 87.60%
>> ShortMaxVector.MULLanes | 1024 | avgt | 10 | 16857.441 | 762.428 | 2287.141 | 0.186 | ns/op | 86.40%
>> ShortMaxVector.MULMaskedLanes | 1024 | avgt | 10 | 21171.375 | 74.684 | 2532.913 | 0.274 | ns/op | 88.00%
>> 
>> </google-sheets-html-origin>
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   mimor

Thanks for the update. Several more comments after another look.

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2961:

> 2959:                                               BasicType bt, uint vector_length, VectorMask vm) {
> 2960:   assert(bt == T_BYTE || bt == T_SHORT || bt == T_INT || bt == T_LONG, "unsupported element type");
> 2961:   uint len = vector_length / type2aelembytes(bt);

Why not pass the number of elements directly by param `vector_length`? On the call sites in the ad file, `Matcher::vector_length(this)` gives exactly what you want.

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 3001:

> 2999:                                         BasicType bt, uint vector_length, VectorMask vm) {
> 3000:   assert(bt == T_FLOAT || bt == T_DOUBLE, "unsupported element type");
> 3001:   uint len = vector_length / type2aelembytes(bt);

Similar here.

src/hotspot/cpu/riscv/riscv_v.ad line 106:

> 104:       case Op_MulReductionVF:
> 105:       case Op_MulReductionVD:
> 106:         if (vlen < 4) {

A code comment about why this required would help understand.

src/hotspot/cpu/riscv/riscv_v.ad line 2464:

> 2462: %}
> 2463: 
> 2464: instruct reduce_mulL(iRegLNoSp dst, iRegLNoSp isrc, vReg vsrc,

Suggestion: `iRegL isrc`

src/hotspot/cpu/riscv/riscv_v.ad line 2479:

> 2477: %}
> 2478: 
> 2479: instruct reduce_mulL_masked(iRegLNoSp dst, iRegLNoSp isrc, vReg vsrc,

Suggestion: `iRegL isrc`

-------------

PR Review: https://git.openjdk.org/jdk/pull/23580#pullrequestreview-2622159419
PR Review Comment: https://git.openjdk.org/jdk/pull/23580#discussion_r1958962339
PR Review Comment: https://git.openjdk.org/jdk/pull/23580#discussion_r1959000596
PR Review Comment: https://git.openjdk.org/jdk/pull/23580#discussion_r1958905457
PR Review Comment: https://git.openjdk.org/jdk/pull/23580#discussion_r1959002320
PR Review Comment: https://git.openjdk.org/jdk/pull/23580#discussion_r1959002749