RFR: 8357551: RISC-V: support CMoveF/D vectorization [v5]

Fri Nov 21 03:41:46 UTC 2025

On Tue, 18 Nov 2025 09:27:44 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> 
>> This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`.
>> 
>> This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231.
>> 
>> Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work.
>> 
>> # Test
>> ## Jtreg
>> 
>> in progress...
>> 
>> ## Performance
>> 
>> Column names meanings:
>> * p: with patch
>> * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on
>> * m: without patch
>> * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on
>> 
>> #### Average improvement
>> 
>> NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231.
>> 
>> For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv.
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v)
>> -- | -- | -- | --
>> 1.022782609 | 2.198717391 | 2.162673913 | 2.199
>> 
>> </google-sheets-html-origin>
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   replace assert with log_warning

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1590:

> 1588:     // jump if cmp1 < cmp2 or either is NaN
> 1589:     // not jump (i.e. move src to dst) if cmp1 >= cmp2
> 1590:     float_blt(cmp1, cmp2, no_set);

I compared this with the existing `MacroAssembler::cmov_cmp_fp_ge` [1] and I witnessed some difference in the case of `NaN` handling. In `MacroAssembler::cmov_cmp_fp_ge`, we set the `is_unordered` param to true when calling `float_blt` or `double_blt`, which is not the case here. I assume we need similar handling here as well, right?

[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L1338

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1636:

> 1634:     // jump if cmp1 <= cmp2 or either is NaN
> 1635:     // not jump (i.e. move src to dst) if cmp1 > cmp2
> 1636:     float_ble(cmp1, cmp2, no_set);

Same question here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2548424215
PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2548424568