RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv)

Emanuel Peter epeter at openjdk.org
Tue Jun 3 13:52:56 UTC 2025


On Tue, 3 Jun 2025 13:46:24 GMT, Hamlin Li <mli at openjdk.org> wrote:

>>> Seems currently the unsigned comparison is not supported for superword vectorization?
>> 
>> I think that currently only `float` and `doulbe` for CMove is really implemented. Integer types are still to be added, see [JDK-8308841](https://bugs.openjdk.org/browse/JDK-8308841)
>> C2 SuperWord: implement vectorization of integer CMove
>> I hope we get to it soon, and then we can generally extend the combinations too. Like comparing `int`, but blending between `double`.
>> 
>> Maybe it would be better if for now you focus just on the `D/F` cases that are already supported on x86 and aarch64?
>
> Thanks for the information!
> I'll hold off these prs until integer CMove vectorization is fully supported.
> 
> At first, I also just planned to implement the CMoveF/D on riscv and let it automatically vectorized based on current C2 implementation.
> But, I found some performance regression in the cases of some type combination (please check the `table 1` below), the reason is that for some type combination cmoveF/D can not be vectorized, because of the type size check in `SuperWord::is_velt_basic_type_compatible_use_def`, on the other hand scalar implementation of CMoveF/D on riscv explode the generated code after loop unroll (because of the complicated implmentation on riscv). These 2 reasons will lead to the performance regression in some cases.
> 
> table 1
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
> Can be vectorized? | CMoveF | CMoveD
> -- | -- | --
> CmpI | V | X
> CmpU | V | X
> CmpL | X | V
> CmpUL | X | V
> CmpF | V | X
> CmpD | X | V
> CmpN | V | X
> CmpP | X | V
> 
> </google-sheets-html-origin>

Yes, getting this all right and with optimal performance is tricky... @jaskarth is working on https://github.com/openjdk/jdk/pull/23413, which will make changes to `SuperWord::is_velt_basic_type_compatible_use_def` ... so  we also will have to see how this plays together...

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25336#discussion_r2123889184


More information about the hotspot-compiler-dev mailing list