[vectorIntrinsics] RFR: Improve mask reduction operations on AVX [v3]

Mai Đặng Quân Anh duke at openjdk.java.net
Tue Nov 9 16:22:59 UTC 2021


On Mon, 8 Nov 2021 10:10:54 GMT, Eric Liu <eliu at openjdk.org> wrote:

>> Mai Đặng Quân Anh has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - support for non-bmi, some refinement
>>  - restore VectorStoreMaskNode, move logic to backend
>
> src/hotspot/share/opto/vectorIntrinsics.cpp line 703:
> 
>> 701:   const Type* maskoper_ty = mopc == Op_VectorMaskToLong ? (const Type*)TypeLong::LONG : (const Type*)TypeInt::INT;
>> 702:   Node* maskoper = gvn().transform(VectorMaskOpNode::make(mask_vec, maskoper_ty, mopc));
>> 703:   if (mopc != Op_VectorMaskToLong) {
> 
> There may have some regressions on AArch64 after refactoring those reduction operations base on this PR. 
> 
> With the `VectorStoreMaskNode`, the load-store pair could be optimized. After removing that, `VectorLoadMaskNode` needs extra extension instructions for various type. E.g.
> 
> 
> After:
>         ldr     h16, [x10, #16]  // LoadVector
>         uxtl    v16.8h, v16.8b
>         uxtl    v16.4s, v16.4h
>         uxtl    v16.2d, v16.2s
>         neg     v16.2d, v16.2d  // VectorLoadMask
>         neg     v17.2d, v16.2d
>         addv    b17, v17.16b
>         umov    w11, v17.b[0]   // TrueCount
> 
> Before:
> 
>         ldr     h16, [x10, #16] // LoadVector
>         addv    b17, v16.8b
>         umov    w11, v17.b[0]   // TrueCount
> 
> 
> No sliver bullet on this problem. Either you can match `VectorMaskTrueCount (VectorLoadMask src)` on AArch64 with this mid-end change in your patch, or just let the mid-end stay the same and match `VectorMaskTrueCount (VectorStoreMask src)` on x86 to pursue the better performance. The later would be better for its less effort.

Thank you very much for your investigation, I have reverted the changes in the mid-end and pushed the change to the back-end as your suggestion. Another approach may be to elide the `VectorLoadMaskNode` attached to a mask reduction node but I think it should be another PR later.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/158


More information about the panama-dev mailing list