RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv)

Tue Jun 3 13:48:57 UTC 2025

On Tue, 3 Jun 2025 13:03:39 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> There is some issue when the comparison is unsigned one, e.g. `c[i] = Long.compareUnsigned(a[i], b[i]) > 0 ? 1.0 : 2.0;`, or `c[i] = (a[i] > b[i]) ? 1.0 : 2.0;` when a[]/b[] are char[].
>> 
>> Seems currently the unsigned comparison is not supported for superword vectorization? The unsigned information is lost, i.e. all the comparisons are just signed ones.
>> I checked the geneated code, and seems when VectorMaskCmp is matched, `BoolTest::unsigned_compare & cond` is always 0 in the passed in `cond` parameter.
>> (Vector API supports unsigned ones, as it passes in `cond` with `BoolTest::unsigned_compare` mask explicitly when the operator is in UGE/UGT/ULE/ULT.)
>
>> Seems currently the unsigned comparison is not supported for superword vectorization?
> 
> I think that currently only `float` and `doulbe` for CMove is really implemented. Integer types are still to be added, see [JDK-8308841](https://bugs.openjdk.org/browse/JDK-8308841)
> C2 SuperWord: implement vectorization of integer CMove
> I hope we get to it soon, and then we can generally extend the combinations too. Like comparing `int`, but blending between `double`.
> 
> Maybe it would be better if for now you focus just on the `D/F` cases that are already supported on x86 and aarch64?

Thanks for the information!
I'll hold off these prs until integer CMove vectorization is fully supported.

At first, I also just planned to implement the CMoveF/D on riscv and let it automatically vectorized based on current C2 implementation.
But, I found some performance regression in the cases of some type combination (please check the `table 1` below), the reason is that for some type combination cmoveF/D can not be vectorized, because of the type size check in `SuperWord::is_velt_basic_type_compatible_use_def`, on the other hand scalar implementation of CMoveF/D on riscv explode the generated code after loop unroll (because of the complicated implmentation on riscv). These 2 reasons will lead to the performance regression in some cases.

table 1
<google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
Can be vectorized? | CMoveF | CMoveD
-- | -- | --
CmpI | V | X
CmpU | V | X
CmpL | X | V
CmpUL | X | V
CmpF | V | X
CmpD | X | V
CmpN | V | X
CmpP | X | V

</google-sheets-html-origin>

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25336#discussion_r2123879603