RFR: 8342095: Add autovectorizer support for subword vector casts [v3]
Jasmine Karthikeyan
jkarthikeyan at openjdk.org
Mon Feb 17 15:03:14 UTC 2025
On Mon, 17 Feb 2025 12:03:30 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> I also updated the benchmark, and got these results:
>>
>> Baseline Patch
>> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement
>> VectorSubword.byteToInt 1024 avgt 12 185.700 ± 0.798 ns/op 37.427 ± 0.276 ns/op (4.96x)
>> VectorSubword.byteToShort 1024 avgt 12 240.737 ± 1.087 ns/op 23.094 ± 0.502 ns/op (10.42x)
>> VectorSubword.intToByte 1024 avgt 12 181.680 ± 0.553 ns/op 49.873 ± 1.613 ns/op (3.64x)
>> VectorSubword.intToShort 1024 avgt 12 176.256 ± 1.414 ns/op 43.933 ± 4.310 ns/op (4.01x)
>> VectorSubword.shortToByte 1024 avgt 12 245.600 ± 6.217 ns/op 28.426 ± 0.649 ns/op (8.64x)
>> VectorSubword.shortToInt 1024 avgt 12 178.364 ± 2.921 ns/op 34.140 ± 0.229 ns/op (5.22x)
>
> @jaskarth just ping me whenever I should have a look again!
@eme64 I think it should be good for another look over! I've addressed your review comments in the last commit.
About the potential for performance degradation, I think it would be unlikely since the code generated by the cast is quite small (as it only needs to truncate or sign-extend) and the patch increases the amount of possible code that can auto-vectorize. The one case that I can think of is that it might cause code that would be otherwise unprofitable to become vectorizable, but that would be because we don't have a cost model yet.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2663375243
More information about the hotspot-compiler-dev
mailing list