[vectorIntrinsics] RFR: Improve mask reduction operations on AVX [v2]

Eric Liu eliu at openjdk.java.net
Mon Nov 8 10:16:10 UTC 2021


On Wed, 3 Nov 2021 07:55:57 GMT, Mai Đặng Quân Anh <duke at openjdk.java.net> wrote:

>> Hi,
>> This patch improves the logic of vector mask reduction operations on AVX, especially int, float, long, double, by using vmovmskpd and vmovmskps instructions. I also do a little refactoring to reduce duplication in toLong. The patch temporarily disables these operations on Neon, though.
>> Thank you very much.
>
> Mai Đặng Quân Anh has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - fix last true
>  - further improvement

src/hotspot/share/opto/vectorIntrinsics.cpp line 703:

> 701:   if (mask_vec->bottom_type()->isa_vectmask() == NULL) {
> 702:     mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem));
> 703:   }

There may have some regressions on AArch64 after refactoring those reduction operations base on this PR. 

With the `VectorStoreMaskNode`, the load-store pair could be optimized. After removing that, `VectorLoadMaskNode` needs extra extension instructions for various type. E.g.


After:
        ldr     h16, [x10, #16]  // LoadVector
        uxtl    v16.8h, v16.8b
        uxtl    v16.4s, v16.4h
        uxtl    v16.2d, v16.2s
        neg     v16.2d, v16.2d  // VectorLoadMask
        neg     v17.2d, v16.2d
        addv    b17, v17.16b
        umov    w11, v17.b[0]   // TrueCount
        sxtw    x0, w11         // StoreVector

Before:

        ldr     h16, [x10, #16] // LoadVector
        addv    b17, v16.8b
        umov    w11, v17.b[0]   // TrueCount
        sxtw    x0, w11         // StoreVector


No sliver bullet on this problem. Either you can match `VectorMaskTrueCount (VectorLoadMask src)` on AArch64 with this mid-end change in your patch, or just let the mid-end stay the same and match `VectorMaskTrueCount (VectorStoreMask src)` on x86 to pursue the better performance. The later would be better for its less effort.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/158


More information about the panama-dev mailing list