RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation
Quan Anh Mai
qamai at openjdk.org
Wed Sep 18 16:15:38 UTC 2024
Hi,
This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout.
Regarding the related issues:
- [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout.
- [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate`
- [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests.
Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables.
Please take a look and leave reviews. Thanks a lot.
The description of the original PR:
This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks:
Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically.
Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations.
Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler.
Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones.
Upon these changes, a `rearrange` can emit more efficient code:
var species = IntVector.SPECIES_128;
var v1 = IntVector.fromArray(species, SRC1, 0);
var v2 = IntVector.fromArray(species, SRC2, 0);
v1.rearrange(v2.toShuffle()).intoArray(DST, 0);
Before:
movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})}
vmovdqu 0x10(%r10),%xmm2
movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})}
vmovdqu 0x10(%r10),%xmm1
movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})}
vmovdqu 0x10(%r10),%xmm0
vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask
; {external_word}
vpackusdw %xmm0,%xmm0,%xmm0
vpackuswb %xmm0,%xmm0,%xmm0
vpmovsxbd %xmm0,%xmm3
vpcmpgtd %xmm3,%xmm1,%xmm3
vtestps %xmm3,%xmm3
jne 0x00007fc2acb4e0d8
vpmovzxbd %xmm0,%xmm0
vpermd %ymm2,%ymm0,%ymm0
movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})}
vmovdqu %xmm0,0x10(%r10)
After:
movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})}
vmovdqu 0x10(%r10),%xmm1
movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})}
vmovdqu 0x10(%r10),%xmm2
vpxor %xmm0,%xmm0,%xmm0
vpcmpgtd %xmm2,%xmm0,%xmm3
vtestps %xmm3,%xmm3
jne 0x00007fa818b27cb1
vpermd %ymm1,%ymm2,%ymm0
movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})}
vmovdqu %xmm0,0x10(%r10)
-------------
Commit messages:
- copyright year
- remove LoadShuffle from riscv, whitespace
- tighten concrete types
- [vectorapi] Refactor VectorShuffle implementation
Changes: https://git.openjdk.org/jdk/pull/21042/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8310691
Stats: 4984 lines in 64 files changed: 2984 ins; 981 del; 1019 mod
Patch: https://git.openjdk.org/jdk/pull/21042.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/21042/head:pull/21042
PR: https://git.openjdk.org/jdk/pull/21042
More information about the hotspot-compiler-dev
mailing list