RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2]
Emanuel Peter
epeter at openjdk.org
Tue Oct 15 16:32:15 UTC 2024
On Sun, 6 Oct 2024 08:32:20 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
>> Hi,
>>
>> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout.
>>
>> Regarding the related issues:
>>
>> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout.
>> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate`
>> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests.
>>
>> Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables.
>>
>> Please take a look and leave reviews. Thanks a lot.
>>
>> The description of the original PR:
>>
>> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks:
>>
>> Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically.
>> Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations.
>> Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler.
>> Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones.
>> Upon these changes, a `rearrange` can emit more efficient code:
>>
>> var species = IntVector.SPECIES_128;
>> var v1 = IntVector.fromArray(species, SRC1, 0);
>> var v2 = IntVector.fromArray(species, SRC2, 0);
>> v1.rearrange(v2.toShuffle()).intoArray(DST, 0);
>>
>> Before:
>> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})}
>> vmovdqu 0x10(%r10),%xmm2
>> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})}
>> vmovdqu 0x10(%r10),%xmm1
>> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})}
>> vmovdqu 0x10(%r10),%xmm0
>> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byt...
>
> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit:
>
> [vectorapi] Refactor VectorShuffle implementation
src/hotspot/cpu/x86/x86.ad line 2172:
> 2170:
> 2171: // Return true if Vector::rearrange needs preparation of the shuffle argument
> 2172: bool Matcher::vector_needs_load_shuffle(BasicType elem_bt, int vlen) {
I think the name needs to be more expressive. If I read it alone, then I would think that it is about all kinds of vectors ... and it is confusing because what is a "load shuffle"? Are we shuffling loads or loading shuffles?
src/hotspot/share/opto/vectornode.hpp line 1618:
> 1616: public:
> 1617: VectorLoadShuffleNode(Node* in, const TypeVect* vt)
> 1618: : VectorNode(in, vt) {}
Can you add a comment above "class VectorLoadShuffleNode" to say what its semantics are?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1801531980
PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1801536233
More information about the core-libs-dev
mailing list