RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v3]
Jatin Bhateja
jbhateja at openjdk.org
Thu Sep 19 07:32:38 UTC 2024
On Wed, 18 Sep 2024 17:00:30 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
>> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code.
>>
>> Summary of changes is as follows:
>> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes.
>> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code
>>
>> For the following source:
>>
>>
>> public void test() {
>> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0);
>> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) {
>> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j);
>> index.selectFrom(inpvect).intoArray(byteres, j);
>> }
>> }
>>
>>
>> The code generated for inner main now looks as follows:
>> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96
>> 0x00007f40d02274d0: movslq %ebx,%r13
>> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1
>> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1
>> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1)
>> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1
>> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1
>> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1)
>> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1
>> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1
>> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1)
>> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1
>> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1
>> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1)
>> 0x00007f40d022751f: add $0x40,%ebx
>> 0x00007f40d0227522: cmp %r8d,%ebx
>> 0x00007f40d0227525: jl 0x00007f40d02274d0
>>
>> Best Regards,
>> Sandhya
>
> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
>
> Change method name
Hi @sviswa7 , some comments, overall patch looks good to me.
Best Regards,
Jatin
src/hotspot/share/opto/vectorIntrinsics.cpp line 2120:
> 2118:
> 2119: if (vector_klass == nullptr || elem_klass == nullptr || vlen == nullptr) {
> 2120: return false; // dead code
Why dead code in comments ?
src/hotspot/share/opto/vectorIntrinsics.cpp line 2129:
> 2127: NodeClassNames[argument(2)->Opcode()],
> 2128: NodeClassNames[argument(3)->Opcode()]);
> 2129: return false; // not enough info for intrinsification
Please club this with above condition to be consistent with other inline expanders.
src/hotspot/share/opto/vectorIntrinsics.cpp line 2141:
> 2139: }
> 2140: BasicType elem_bt = elem_type->basic_type();
> 2141:
Remove new line.
src/hotspot/share/opto/vectorIntrinsics.cpp line 2144:
> 2142: int num_elem = vlen->get_con();
> 2143: if ((num_elem < 4) || !is_power_of_2(num_elem)) {
> 2144: log_if_needed(" ** vlen < 4 or not power of two=%d", num_elem);
Will num_elem < 4 not be handled by L2149 since we have an implementation limitation to support less than 32-bit shuffle / masks.
src/hotspot/share/opto/vectorIntrinsics.cpp line 2171:
> 2169: use_predicate = false;
> 2170: if(!is_masked_op ||
> 2171: (!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, VecMaskNotUsed) ||
Suggestion:
(!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, VecMaskUseLoad) ||
src/hotspot/share/opto/vectorIntrinsics.cpp line 2188:
> 2186:
> 2187: if (v1 == nullptr || v2 == nullptr) {
> 2188: return false; // operand unboxing failed
To be consistent with other expanders please emit proper error for unboxing failure like on following line.
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L426
src/hotspot/share/opto/vectorIntrinsics.cpp line 2197:
> 2195: mask = unbox_vector(argument(6), mbox_type, elem_bt, num_elem);
> 2196: if (mask == nullptr) {
> 2197: log_if_needed(" ** not supported: op=selectFrom vlen=%d etype=%s is_masked_op=1",
Error should an unboxing failure here.
-------------
PR Review: https://git.openjdk.org/jdk/pull/20634#pullrequestreview-2314643808
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766277056
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766277739
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766278169
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766297640
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766292679
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766303620
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766304688
More information about the hotspot-compiler-dev
mailing list