RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v2]

Tue Mar 21 16:16:34 UTC 2023

On Sun, 19 Mar 2023 19:38:04 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> Hi,
>> 
>> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks:
>> 
>> 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically.
>> 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations.
>> 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler.
>> 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones.
>> 
>> Upon these changes, a `rearrange` can emit more efficient code:
>> 
>>     var species = IntVector.SPECIES_128;
>>     var v1 = IntVector.fromArray(species, SRC1, 0);
>>     var v2 = IntVector.fromArray(species, SRC2, 0);
>>     v1.rearrange(v2.toShuffle()).intoArray(DST, 0);
>> 
>>     Before:
>>     movabs $0x751589fa8,%r10            ;   {oop([I{0x0000000751589fa8})}
>>     vmovdqu 0x10(%r10),%xmm2
>>     movabs $0x7515a0d08,%r10            ;   {oop([I{0x00000007515a0d08})}
>>     vmovdqu 0x10(%r10),%xmm1
>>     movabs $0x75158afb8,%r10            ;   {oop([I{0x000000075158afb8})}
>>     vmovdqu 0x10(%r10),%xmm0
>>     vpand  -0xddc12(%rip),%xmm0,%xmm0        # Stub::vector_int_to_byte_mask
>>                                                             ;   {external_word}
>>     vpackusdw %xmm0,%xmm0,%xmm0
>>     vpackuswb %xmm0,%xmm0,%xmm0
>>     vpmovsxbd %xmm0,%xmm3
>>     vpcmpgtd %xmm3,%xmm1,%xmm3
>>     vtestps %xmm3,%xmm3
>>     jne    0x00007fc2acb4e0d8
>>     vpmovzxbd %xmm0,%xmm0
>>     vpermd %ymm2,%ymm0,%ymm0
>>     movabs $0x751588f98,%r10            ;   {oop([I{0x0000000751588f98})}
>>     vmovdqu %xmm0,0x10(%r10)
>> 
>>     After:
>>     movabs $0x751589c78,%r10            ;   {oop([I{0x0000000751589c78})}
>>     vmovdqu 0x10(%r10),%xmm1
>>     movabs $0x75158ac88,%r10            ;   {oop([I{0x000000075158ac88})}
>>     vmovdqu 0x10(%r10),%xmm2
>>     vpxor  %xmm0,%xmm0,%xmm0
>>     vpcmpgtd %xmm2,%xmm0,%xmm3
>>     vtestps %xmm3,%xmm3
>>     jne    0x00007fa818b27cb1
>>     vpermd %ymm1,%ymm2,%ymm0
>>     movabs $0x751588c68,%r10            ;   {oop([I{0x0000000751588c68})}
>>     vmovdqu %xmm0,0x10(%r10)
>>     
>> Please take a look and leave reviews. Thanks a lot.
>
> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fix Matcher::vector_needs_load_shuffle

Yes I will try to polish the patch more after finding the cause of the failure in x86_32. The failure is strange, though, it does not occur on x86_64 for some reasons.

> One annoyance in the API which propagates down into the implementation is `VectorShuffle<E>` and `VectorMask<E>` have `E` that is the lane element type.

Yes I agree, a shuffle merely contains the lane indices while a mask is an array of boolean, it would be a good cleanup to remove `E` from the interface.

> However, i don't have a good sense of the implications this has to the current HotSpot implementation and whether it is feasible.

Note that generics are erased, so from the VM point of view, a `VectorMask<E>` and a `VectorMask` is indifferent. As a result, removing the type parameter should not have any impact on the VM. Some details may have to change though, as element types are removed, a mask or shuffle would only be validated in accordance to its length, and we need to insert a cast at use sites. The cast will be removed if it is actually the same species so there is little concern regarding the machine code emitted.

Thanks a lot.

I have moved most of the methods to `AbstractVector` and `AbstractShuffle`, I have to resort to raw types, though, since there seems to be no way to do the same with wild cards, and the generics mechanism is not powerful enough for things like `Vector<E.integral>`. The remaining failure seems to be related to [JDK-8304676](https://bugs.openjdk.org/projects/JDK/issues/JDK-8304676), so I think this patch is ready for review now.

> The mask implementation is specialized by the species of vectors it operates on, but does it have to be

Apart from the mask implementation, shuffle implementation definitely has to take into consideration the element type. However, this information does not have to be visible to the API, similar to how we currently handle the vector length, we can have `class AbstractMask<E> implements VectorMask`. As a result, the cast method would be useless and can be removed in the API, but our implementation details would still use it, for example

    Vector<E> blend(Vector<E> v, VectorMask w) {
        AbstractMask<?> aw = (AbstractMask<?>) w;
        AbstractMask<E> tw = aw.cast(vspecies());
        return VectorSupport.blend(...);
    }

    Vector<E> rearrange(VectorShuffle s) {
        AbstractShuffle<?> as = (AbstractShuffle<?>) s;
        AbstractShuffle<E> ts = s.cast(vspecies());
        return VectorSupport.rearrangeOp(...);
    }

What do you think?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13093#issuecomment-1477581887
PR Comment: https://git.openjdk.org/jdk/pull/13093#issuecomment-1478140557