RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v4]
Quan Anh Mai
qamai at openjdk.org
Wed Mar 22 12:46:33 UTC 2023
> Hi,
>
> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks:
>
> 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically.
> 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations.
> 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler.
> 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones.
>
> Upon these changes, a `rearrange` can emit more efficient code:
>
> var species = IntVector.SPECIES_128;
> var v1 = IntVector.fromArray(species, SRC1, 0);
> var v2 = IntVector.fromArray(species, SRC2, 0);
> v1.rearrange(v2.toShuffle()).intoArray(DST, 0);
>
> Before:
> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})}
> vmovdqu 0x10(%r10),%xmm2
> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})}
> vmovdqu 0x10(%r10),%xmm1
> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})}
> vmovdqu 0x10(%r10),%xmm0
> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask
> ; {external_word}
> vpackusdw %xmm0,%xmm0,%xmm0
> vpackuswb %xmm0,%xmm0,%xmm0
> vpmovsxbd %xmm0,%xmm3
> vpcmpgtd %xmm3,%xmm1,%xmm3
> vtestps %xmm3,%xmm3
> jne 0x00007fc2acb4e0d8
> vpmovzxbd %xmm0,%xmm0
> vpermd %ymm2,%ymm0,%ymm0
> movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})}
> vmovdqu %xmm0,0x10(%r10)
>
> After:
> movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})}
> vmovdqu 0x10(%r10),%xmm1
> movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})}
> vmovdqu 0x10(%r10),%xmm2
> vpxor %xmm0,%xmm0,%xmm0
> vpcmpgtd %xmm2,%xmm0,%xmm3
> vtestps %xmm3,%xmm3
> jne 0x00007fa818b27cb1
> vpermd %ymm1,%ymm2,%ymm0
> movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})}
> vmovdqu %xmm0,0x10(%r10)
>
> Please take a look and leave reviews. Thanks a lot.
Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
reviews
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/13093/files
- new: https://git.openjdk.org/jdk/pull/13093/files/4caa9d10..e0b9ee88
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=13093&range=03
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=13093&range=02-03
Stats: 17 lines in 5 files changed: 0 ins; 0 del; 17 mod
Patch: https://git.openjdk.org/jdk/pull/13093.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/13093/head:pull/13093
PR: https://git.openjdk.org/jdk/pull/13093
More information about the hotspot-compiler-dev
mailing list