RFR: 8342095: Add autovectorizer support for subword vector casts [v3]

Mon Feb 17 15:03:14 UTC 2025

On Mon, 17 Feb 2025 12:03:30 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I also updated the benchmark, and got these results:
>> 
>>                                                   Baseline                    Patch
>> Benchmark                  (SIZE)  Mode  Cnt    Score    Error  Units  Score    Error  Units  Improvement
>> VectorSubword.byteToInt      1024  avgt   12  185.700 ± 0.798  ns/op   37.427 ± 0.276  ns/op  (4.96x)
>> VectorSubword.byteToShort    1024  avgt   12  240.737 ± 1.087  ns/op   23.094 ± 0.502  ns/op  (10.42x)
>> VectorSubword.intToByte      1024  avgt   12  181.680 ± 0.553  ns/op   49.873 ± 1.613  ns/op  (3.64x)
>> VectorSubword.intToShort     1024  avgt   12  176.256 ± 1.414  ns/op   43.933 ± 4.310  ns/op  (4.01x)
>> VectorSubword.shortToByte    1024  avgt   12  245.600 ± 6.217  ns/op   28.426 ± 0.649  ns/op  (8.64x)
>> VectorSubword.shortToInt     1024  avgt   12  178.364 ± 2.921  ns/op   34.140 ± 0.229  ns/op  (5.22x)
>
> @jaskarth just ping me whenever I should have a look again!

@eme64 I think it should be good for another look over! I've addressed your review comments in the last commit.

About the potential for performance degradation, I think it would be unlikely since the code generated by the cast is quite small (as it only needs to truncate or sign-extend) and the patch increases the amount of possible code that can auto-vectorize. The one case that I can think of is that it might cause code that would be otherwise unprofitable to become vectorizable, but that would be because we don't have a cost model yet.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2663375243