RFR: 8342095: Add autovectorizer support for subword vector casts [v4]

Emanuel Peter epeter at openjdk.org
Thu Feb 13 10:16:14 UTC 2025


On Sun, 9 Feb 2025 06:03:03 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

>> Hi all,
>> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine:
>> 
>> 
>>                                                   Baseline                    Patch
>> Benchmark                  (SIZE)  Mode  Cnt    Score    Error  Units   Score    Error  Units    Improvement
>> VectorSubword.intToByte      1024  avgt   12  200.049 ± 19.787  ns/op   56.228 ± 3.535  ns/op  (3.56x)
>> VectorSubword.intToShort     1024  avgt   12  179.826 ±  1.539  ns/op   43.332 ± 1.166  ns/op  (4.15x)
>> VectorSubword.shortToByte    1024  avgt   12  245.580 ±  6.150  ns/op   29.757 ± 1.055  ns/op  (8.25x)
>> 
>> 
>> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated!
>
> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add new conversions to benchmark

@jaskarth That looks great! Thanks for the updates!

A little control question:
We now add additional vector operations into the loop body. How do we know this will not lead to a regression in some cases? I think it should not... right? Are the casts no-ops, or could they have some cost to them? Could there be any case where the wins from vectorization would not make up for the extra vector cast node?

src/hotspot/cpu/x86/matcher_x86.hpp line 273:

> 271:     if (to_bt == from_bt) {
> 272:       return false;
> 273:     }

Hmm, do we expect that this ever gets triggered? Or would that be a bug? Maybe not, but could be worth adding a defensice assert here, what do you think?

src/hotspot/cpu/x86/matcher_x86.hpp line 284:

> 282:         return false;
> 283:       }
> 284:     }

You could leave a quick comment about why `CHAR` is not yet covered here.

src/hotspot/share/opto/vtransform.hpp line 537:

> 535:   virtual VTransformApplyResult apply(const VLoopAnalyzer& vloop_analyzer,
> 536:                                       const GrowableArray<Node*>& vnode_idx_to_transformed_node) const override;
> 537:   NOT_PRODUCT(virtual const char* name() const override { return "Cast"; };)

Suggestion:

  NOT_PRODUCT(virtual const char* name() const override { return "CastVector"; };)

test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java line 333:

> 331:         applyIfPlatform = {"64-bit", "true"},
> 332:         applyIf = {"AlignVector", "false"},
> 333:         applyIfCPUFeature = {"avx", "true"})

This may be a little nit-picky. But why have a new test-file when this test here was already trying to cover the conversion cases? I think I wrote it back then, and was just too lazy to write all conversion cases. I'd suggest you move your cases up here ;)

-------------

PR Review: https://git.openjdk.org/jdk/pull/23413#pullrequestreview-2614468035
PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r1954188505
PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r1954195558
PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r1954203200
PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r1954207931


More information about the hotspot-compiler-dev mailing list