[vectorIntrinsics+mask] RFR: 8271313: AArch64: SVE backend support for masking operations with predicate feature

Fri Jul 30 04:11:45 UTC 2021

On Fri, 30 Jul 2021 03:31:14 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

> This is the initial SVE backend implementations for all masking operations with the SVE predicate feature. It contains:
>  - SVE codegen for vector operations under a mask controlling
>  - SVE codegen for vector mask operations with predicate instructions
> 
> The size of libjvm.so increases about 1.86% after adding all the backend changes. And the performance gain is about 3.7% ~ 7.88x for some masking operations of IntMaxVector with SVE 512 bits:
> 
> Benchmark                         Gain
> IntMaxVector.ABSMasked            1.198
> IntMaxVector.ADDMasked            1.040
> IntMaxVector.ADDMaskedLanes       1.068
> IntMaxVector.ANDMasked            1.117
> IntMaxVector.ANDMaskedLanes       1.101
> IntMaxVector.AND_NOTMasked        1.037
> IntMaxVector.ASHRMasked           1.286
> IntMaxVector.ASHRMaskedShift      1.096
> IntMaxVector.BITWISE_BLENDMasked  1.085
> IntMaxVector.LSHRMasked           1.405
> IntMaxVector.LSHRMaskedShift      1.092
> IntMaxVector.MAXMaskedLanes       1.079
> IntMaxVector.MINMaskedLanes       1.079
> IntMaxVector.MULMasked            1.370
> IntMaxVector.ORMasked             1.038
> IntMaxVector.ORMaskedLanes        1.103
> IntMaxVector.SUBMasked            1.043
> IntMaxVector.XORMasked            1.151
> IntMaxVector.XORMaskedLanes       1.103
> IntMaxVector.allTrue              1.157
> IntMaxVector.anyTrue              1.158
> IntMaxVector.gatherMasked         7.880
> IntMaxVector.scatterMasked        4.732

So does the performance gain reasonable, especially for something like IntMaxVector.AND_NOTMasked?
Thanks.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/105