RFR: 8342095: Add autovectorizer support for subword vector casts [v13]
Emanuel Peter
epeter at openjdk.org
Thu May 15 13:18:02 UTC 2025
On Mon, 12 May 2025 03:11:52 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:
>> Hi all,
>> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine:
>>
>>
>> Baseline Patch
>> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement
>> VectorSubword.intToByte 1024 avgt 12 200.049 ± 19.787 ns/op 56.228 ± 3.535 ns/op (3.56x)
>> VectorSubword.intToShort 1024 avgt 12 179.826 ± 1.539 ns/op 43.332 ± 1.166 ns/op (4.15x)
>> VectorSubword.shortToByte 1024 avgt 12 245.580 ± 6.150 ns/op 29.757 ± 1.055 ns/op (8.25x)
>>
>>
>> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated!
>
> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision:
>
> Check for AVX2 for byte/long conversions
> This is a good point, while testing I experimented with patterns like this:
>
> ```java
> private static short[] testSubwordVector(short[] out, int[] in) {
> for (int i = 0; i < 512; i++) {
> out[i] = (short) (((short) in[i]) + (short) in[i]);
> }
>
> return out;
> }
> ```
>
> The IR it produces looks like: `StoreC(AddI(RShiftI(LShiftI(LoadI, 16), 16)`. The same thing happens for sign extension as well. I didn't investigate too deeply, but I think the shifts prevent this pattern from vectorizing. The shifts are needed in the scalar IR since we don't have a `AddS` node, but in the future, when translating the IR to the vector graph we could convert the shift pattern into a `VectorCastX2Y` node as well.
I suppose there are 2 options here, when vectorizing:
- Cast between `short <-> int`, do the add in `int`.
- Somehow detect that this is an "`AddS`" in the type analysis phase of SuperWord. And then hack the graph so that we do not need the shifts. This would be more complicated, but might give us better results in the end.
Is there already an RFE for this? If not, would you mind filing one?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2883775942
More information about the hotspot-compiler-dev
mailing list