RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v2]

Xiaohong Gong xgong at openjdk.org
Thu Mar 13 01:32:08 UTC 2025


On Wed, 12 Mar 2025 08:04:09 GMT, Hao Sun <haosun at openjdk.org> wrote:

>> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
>> 
>>  - Merge branch 'jdk:master' into JDK_8349522
>>  - 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations
>>    
>>    Since PR [1] has added several new vector operations in VectorAPI
>>    and the X86 backend implementation for them, this patch adds the
>>    AArch64 backend part for NEON/SVE architectures.
>>    
>>    The performance of Vector API relative jmh micro benchmarks can
>>    improve about 70x ~ 95x on an AArch64 128-bit vector length sve2
>>    architecture with different UseSVE options. Here is the uplift
>>    details:
>>    
>>    ```
>>    Benchmark                  (size)  Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2
>>    ByteMaxVector.SADD          1024  thrpt  30    80.69x       79.70x      80.534x
>>    ByteMaxVector.SADDMasked    1024  thrpt  30    84.08x       85.72x      85.901x
>>    ByteMaxVector.SSUB          1024  thrpt  30    80.46x       80.27x      81.063x
>>    ByteMaxVector.SSUBMasked    1024  thrpt  30    83.96x       85.26x      85.887x
>>    ByteMaxVector.SUADD         1024  thrpt  30    80.43x       80.36x      81.761x
>>    ByteMaxVector.SUADDMasked   1024  thrpt  30    83.40x       84.62x      85.199x
>>    ByteMaxVector.SUSUB         1024  thrpt  30    79.93x       79.22x      79.714x
>>    ByteMaxVector.SUSUBMasked   1024  thrpt  30    82.93x       85.02x      84.726x
>>    ByteMaxVector.UMAX          1024  thrpt  30    78.73x       77.39x      78.220x
>>    ByteMaxVector.UMAXMasked    1024  thrpt  30    82.62x       84.77x      85.531x
>>    ByteMaxVector.UMIN          1024  thrpt  30    79.04x       77.80x      78.471x
>>    ByteMaxVector.UMINMasked    1024  thrpt  30    83.11x       84.86x      86.126x
>>    IntMaxVector.SADD           1024  thrpt  30    83.11x       83.07x      83.183x
>>    IntMaxVector.SADDMasked     1024  thrpt  30    90.67x       91.80x      93.162x
>>    IntMaxVector.SSUB           1024  thrpt  30    83.37x       82.82x      83.317x
>>    IntMaxVector.SSUBMasked     1024  thrpt  30    90.85x       92.87x      94.201x
>>    IntMaxVector.SUADD          1024  thrpt  30    82.76x       81.78x      82.679x
>>    IntMaxVector.SUADDMasked    1024  thrpt  30    90.49x       91.93x      93.155x
>>    IntMaxVector.SUSUB          1024  thrpt  30    82.92x       82.34x      82.525x
>>    IntMaxVector.SUSUBMasked    1024  thrpt  30    90.60x      ...
>
> LGTM

Thanks a lot for your review @shqking  @Bhavana-Kilambi !

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23608#issuecomment-2719511425


More information about the hotspot-compiler-dev mailing list