RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes
Paul Sandoz
psandoz at openjdk.org
Fri Sep 13 20:09:05 UTC 2024
On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code.
>
> Summary of changes is as follows:
> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes.
> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code
>
> For the following source:
>
>
> public void test() {
> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0);
> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) {
> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j);
> index.selectFrom(inpvect).intoArray(byteres, j);
> }
> }
>
>
> The code generated for inner main now looks as follows:
> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96
> 0x00007f40d02274d0: movslq %ebx,%r13
> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1
> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1
> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1)
> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1
> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1
> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1)
> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1
> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1
> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1)
> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1
> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1
> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1)
> 0x00007f40d022751f: add $0x40,%ebx
> 0x00007f40d0227522: cmp %r8d,%ebx
> 0x00007f40d0227525: jl 0x00007f40d02274d0
>
> Best Regards,
> Sandhya
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 2439:
> 2437: (v1, s_, m_) -> v1.uOp((i, a) -> {
> 2438: int ei = s_.laneSource(i);
> 2439: return ei < 0 || !m_.laneIsSet(i) ? 0 : v1.lane(ei);
The `ei < 0` test is redundant.
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2637:
> 2635: *
> 2636: * For each lane {@code N} of the shuffle, and for each lane
> 2637: * source index {@code I=s.wrapIndex(s.laneSource(N))} in the shuffle,
The pseudo code below starting at line 2644 needs adjusting to:
Vector<E> r = this.rearrange(s);
return broadcast(0).blend(r, m);
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2755:
> 2753: *
> 2754: * The result is the same as the expression
> 2755: * {@code v.rearrange(this.toShuffle().wrapIndexes())}.
Since we also adjusted `rearrange` the existing expression is fine, recommend no change here and to the mask accepting version.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759431093
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759428672
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759418829
More information about the core-libs-dev
mailing list