RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v3]

Xiaohong Gong xgong at openjdk.org
Wed Mar 22 08:33:47 UTC 2023


On Tue, 21 Mar 2023 16:16:31 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> Hi,
>> 
>> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks:
>> 
>> 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically.
>> 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations.
>> 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler.
>> 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones.
>> 
>> Upon these changes, a `rearrange` can emit more efficient code:
>> 
>>     var species = IntVector.SPECIES_128;
>>     var v1 = IntVector.fromArray(species, SRC1, 0);
>>     var v2 = IntVector.fromArray(species, SRC2, 0);
>>     v1.rearrange(v2.toShuffle()).intoArray(DST, 0);
>> 
>>     Before:
>>     movabs $0x751589fa8,%r10            ;   {oop([I{0x0000000751589fa8})}
>>     vmovdqu 0x10(%r10),%xmm2
>>     movabs $0x7515a0d08,%r10            ;   {oop([I{0x00000007515a0d08})}
>>     vmovdqu 0x10(%r10),%xmm1
>>     movabs $0x75158afb8,%r10            ;   {oop([I{0x000000075158afb8})}
>>     vmovdqu 0x10(%r10),%xmm0
>>     vpand  -0xddc12(%rip),%xmm0,%xmm0        # Stub::vector_int_to_byte_mask
>>                                                             ;   {external_word}
>>     vpackusdw %xmm0,%xmm0,%xmm0
>>     vpackuswb %xmm0,%xmm0,%xmm0
>>     vpmovsxbd %xmm0,%xmm3
>>     vpcmpgtd %xmm3,%xmm1,%xmm3
>>     vtestps %xmm3,%xmm3
>>     jne    0x00007fc2acb4e0d8
>>     vpmovzxbd %xmm0,%xmm0
>>     vpermd %ymm2,%ymm0,%ymm0
>>     movabs $0x751588f98,%r10            ;   {oop([I{0x0000000751588f98})}
>>     vmovdqu %xmm0,0x10(%r10)
>> 
>>     After:
>>     movabs $0x751589c78,%r10            ;   {oop([I{0x0000000751589c78})}
>>     vmovdqu 0x10(%r10),%xmm1
>>     movabs $0x75158ac88,%r10            ;   {oop([I{0x000000075158ac88})}
>>     vmovdqu 0x10(%r10),%xmm2
>>     vpxor  %xmm0,%xmm0,%xmm0
>>     vpcmpgtd %xmm2,%xmm0,%xmm3
>>     vtestps %xmm3,%xmm3
>>     jne    0x00007fa818b27cb1
>>     vpermd %ymm1,%ymm2,%ymm0
>>     movabs $0x751588c68,%r10            ;   {oop([I{0x0000000751588c68})}
>>     vmovdqu %xmm0,0x10(%r10)
>>     
>> Please take a look and leave reviews. Thanks a lot.
>
> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - missing casts
>  - clean up

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java line 118:

> 116:         return (VectorShuffle<E>) v.rearrange(shuffle.cast(vspecies().asIntegral()))
> 117:                 .toShuffle()
> 118:                 .cast(vspecies());

Style issue. Suggest to change to:

return (VectorShuffle<E>) v.rearrange(shuffle.cast(vspecies().asIntegral()))
                           .toShuffle()
                           .cast(vspecies());

I also noticed that the similar shuffle cast code is used more frequently. Could we wrap such code `toShuffle().cast(vspecies())` to a separate method?

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java line 130:

> 128:         } else {
> 129:             v = v.blend(v.lanewise(VectorOperators.ADD, length()),
> 130:                     v.compare(VectorOperators.LT, 0));

Style issue. Suggest to change to:

v = v.blend(v.lanewise(VectorOperators.ADD, length()),
            v.compare(VectorOperators.LT, 0));

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java line 198:

> 196:         if ((length() & (length() - 1)) != 0) {
> 197:             return wrap ? shuffleFromOp(i -> (VectorIntrinsics.wrapToRange(i * step + start, length())))
> 198:                     : shuffleFromOp(i -> i * step + start);

Code style issue. Suggest to:

return wrap ? shuffleFromOp(i -> (VectorIntrinsics.wrapToRange(i * step + start, length())))
            : shuffleFromOp(i -> i * step + start);

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java line 204:

> 202:         Vector iota = species.iota();
> 203:         iota = iota.lanewise(VectorOperators.MUL, step)
> 204:                 .lanewise(VectorOperators.ADD, start);

Style issue. Same as above.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144384585
PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144389023
PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144390218
PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144390692


More information about the hotspot-compiler-dev mailing list