RFR: 8342095: Add autovectorizer support for subword vector casts [v14]
Emanuel Peter
epeter at openjdk.org
Wed Aug 27 10:25:57 UTC 2025
On Thu, 31 Jul 2025 03:37:47 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:
>> Hi all,
>> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine:
>>
>>
>> Baseline Patch
>> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement
>> VectorSubword.intToByte 1024 avgt 12 200.049 ± 19.787 ns/op 56.228 ± 3.535 ns/op (3.56x)
>> VectorSubword.intToShort 1024 avgt 12 179.826 ± 1.539 ns/op 43.332 ± 1.166 ns/op (4.15x)
>> VectorSubword.shortToByte 1024 avgt 12 245.580 ± 6.150 ns/op 29.757 ± 1.055 ns/op (8.25x)
>>
>>
>> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated!
>
> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits:
>
> - Update tests, cleanup logic
> - Merge branch 'master' into vectorize-subword
> - Check for AVX2 for byte/long conversions
> - Whitespace and benchmark tweak
> - Address more comments, make test and benchmark more exhaustive
> - Merge from master
> - Fix copyright after merge
> - Fix copyright
> - Merge
> - Implement patch with VectorCastNode::implemented
> - ... and 6 more: https://git.openjdk.org/jdk/compare/8fcbb110...aabaafba
I have a few more comments. This is really exciting that these cases could soon work! Thanks for working on it 😊
src/hotspot/share/opto/superword.cpp line 2422:
> 2420: // Opcode is only required to disambiguate half float, so we pass -1 as it can't be encountered here.
> 2421: return (is_subword_type(def_bt) || is_subword_type(use_bt)) && VectorCastNode::implemented(-1, pack_size, def_bt, use_bt);
> 2422: }
Not sure if we discussed this before: should we not move this to `VectorCastNode`, rather than having it in `SuperWord`?
src/hotspot/share/opto/superwordVTransformBuilder.cpp line 197:
> 195:
> 196: // If the use and def types are different, emit a cast node
> 197: if (use_bt != def_bt && !p0->is_Convert() && SuperWord::is_supported_subword_cast(def_bt, use_bt, pack->size())) {
Is `SuperWord::is_supported_subword_cast(def_bt, use_bt, pack->size())` really a true condition that you need to check here (and if false we can continue in the "else"), or should it be rather an assert?
test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java line 513:
> 511: @Test
> 512: @IR(applyIfCPUFeature = { "avx", "true" },
> 513: applyIfOr = {"AlignVector", "false", "UseCompactObjectHeaders", "false"},
Do you think these would be supported with `asimd` as well?
If you just cannot test with it feel free to file an RFE and then I can find someone to take care of it (e.g. as a starter bug).
test/hotspot/jtreg/compiler/vectorization/TestSubwordTruncation.java line 76:
> 74:
> 75: @Test
> 76: @IR(applyIfCPUFeature = { "avx2", "true" }, counts = { IRNode.VECTOR_CAST_I2S, IRNode.VECTOR_SIZE_ANY, ">0" })
Do you think we can make the vector size more precise here?
-------------
Changes requested by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/23413#pullrequestreview-3159195606
PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2303500398
PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2303503806
PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2303508579
PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2303511623
More information about the hotspot-compiler-dev
mailing list