[vectorIntrinsics+mask] RFR: 8270349: Initial X86 backend support for optimizing masking operations on AVX512 targets. [v3]

Fri Aug 13 18:24:16 UTC 2021

On Thu, 12 Aug 2021 20:25:00 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits:
>> 
>>  - 8270349: Review comments resolution.
>>  - Merge branch 'vectorIntrinsics+mask' of http://github.com/openjdk/panama-vector into JDK-8270349
>>  - 8270349: Merge with latest vectorIntrinsics+mask tip + extend backend support for XorV,AndV,OrV and Compare masked operations.
>>  - 8270349: Fix for 32-bit build failure.
>>  - 8270349: Initial X86 backend support for optimizing masking operations on AVX512 targets.
>
> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8449:
> 
>> 8447:     kshiftrbl(kscratch, kscratch, 8-masklen);
>> 8448:     kandbl(kscratch, kscratch, src);
>> 8449:     ktestbl(kscratch, kscratch);
> 
> This could be replaced by: ktestbl(kscratch, src);

DONE

> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8464:
> 
>> 8462:     kshiftlbl(kscratch, kscratch, masklen);
>> 8463:     korbl(kscratch, kscratch, src);
>> 8464:     kortestbl(kscratch, kscratch);
> 
> This could be replaced by a single instruction:
> kortestbl(kscratch, src);

DONE

> src/hotspot/cpu/x86/x86.ad line 3464:
> 
>> 3462:   %}
>> 3463:   ins_pipe( pipe_slow );
>> 3464: %}
> 
> This is overriding the instructs reinterpret_expand* and reitnerpret_shrink. So doesnt look correct.

Both have different predication condition.

> src/hotspot/share/opto/compile.cpp line 2367:
> 
>> 2365:     case Op_AndV:
>> 2366:     case Op_OrV:
>> 2367:       return n->req() == 2;
> 
> Why the check on #of inputs here? And and Or are always binary isnt it?

This is an interim solution for preventing macro logic optimization over masked logical operations. I have a separate patch for masked macro logic inferencing.

> src/hotspot/share/opto/vector.cpp line 344:
> 
>> 342:   // spilled to a vector though a VectorStoreMaskOperation before actual StoreVector
>> 343:   // operation to vector payload field.
>> 344:   if (is_mask && (value->bottom_type()->isa_vectmask() || bt != T_BOOLEAN)) {
> 
> What happens on platforms that dont support predicate registers. Do we generate extra VectorStoreMask instruction now?

This is as of now agnostic to existence of predication support. We dump the mask object into byte vector which is then used during object re-construction, this simplifies the re-construction logic but yes its an overhead for non-predicated targets. One area of improvement here even for predicated targets is to tie VectorStoreMask to same controlling edge as that of uncommon_trap call so that it does not get scheduled into higher ancestor block. I have taken a note of these points and will try to address in separate subsequent patch.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/99