RFR: 8342095: Add autovectorizer support for subword vector casts [v3]

Jasmine Karthikeyan jkarthikeyan at openjdk.org
Sun Feb 9 06:03:05 UTC 2025


On Thu, 6 Feb 2025 19:50:55 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

>> Hi all,
>> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine:
>> 
>> 
>>                                                   Baseline                    Patch
>> Benchmark                  (SIZE)  Mode  Cnt    Score    Error  Units   Score    Error  Units    Improvement
>> VectorSubword.intToByte      1024  avgt   12  200.049 ± 19.787  ns/op   56.228 ± 3.535  ns/op  (3.56x)
>> VectorSubword.intToShort     1024  avgt   12  179.826 ±  1.539  ns/op   43.332 ± 1.166  ns/op  (4.15x)
>> VectorSubword.shortToByte    1024  avgt   12  245.580 ±  6.150  ns/op   29.757 ± 1.055  ns/op  (8.25x)
>> 
>> 
>> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated!
>
> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix some tests that now vectorize

I also updated the benchmark, and got these results:

                                                  Baseline                    Patch
Benchmark                  (SIZE)  Mode  Cnt    Score    Error  Units  Score    Error  Units  Improvement
VectorSubword.byteToInt      1024  avgt   12  185.700 ± 0.798  ns/op   37.427 ± 0.276  ns/op  (4.96x)
VectorSubword.byteToShort    1024  avgt   12  240.737 ± 1.087  ns/op   23.094 ± 0.502  ns/op  (10.42x)
VectorSubword.intToByte      1024  avgt   12  181.680 ± 0.553  ns/op   49.873 ± 1.613  ns/op  (3.64x)
VectorSubword.intToShort     1024  avgt   12  176.256 ± 1.414  ns/op   43.933 ± 4.310  ns/op  (4.01x)
VectorSubword.shortToByte    1024  avgt   12  245.600 ± 6.217  ns/op   28.426 ± 0.649  ns/op  (8.64x)
VectorSubword.shortToInt     1024  avgt   12  178.364 ± 2.921  ns/op   34.140 ± 0.229  ns/op  (5.22x)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2646084657


More information about the hotspot-compiler-dev mailing list