RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v3]

Xiaohong Gong xgong at openjdk.org
Fri Mar 14 09:34:47 UTC 2025


> Since PR [1] has added several new vector operations in VectorAPI and the X86 backend implementation for them, this patch adds the AArch64 backend part for NEON/SVE architectures.
> 
> The performance of Vector API relative JMH micro benchmarks can improve about 70x ~ 95x on a NVIDIA Grace CPU, which is a 128-bit vector length sve2 architecture,  with different UseSVE options. Here is the gain details:
> 
> 
> Benchmark                  (size)  Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2
> ByteMaxVector.SADD          1024  thrpt  30    80.69x       79.70x      80.534x
> ByteMaxVector.SADDMasked    1024  thrpt  30    84.08x       85.72x      85.901x
> ByteMaxVector.SSUB          1024  thrpt  30    80.46x       80.27x      81.063x
> ByteMaxVector.SSUBMasked    1024  thrpt  30    83.96x       85.26x      85.887x
> ByteMaxVector.SUADD         1024  thrpt  30    80.43x       80.36x      81.761x
> ByteMaxVector.SUADDMasked   1024  thrpt  30    83.40x       84.62x      85.199x
> ByteMaxVector.SUSUB         1024  thrpt  30    79.93x       79.22x      79.714x
> ByteMaxVector.SUSUBMasked   1024  thrpt  30    82.93x       85.02x      84.726x
> ByteMaxVector.UMAX          1024  thrpt  30    78.73x       77.39x      78.220x
> ByteMaxVector.UMAXMasked    1024  thrpt  30    82.62x       84.77x      85.531x
> ByteMaxVector.UMIN          1024  thrpt  30    79.04x       77.80x      78.471x
> ByteMaxVector.UMINMasked    1024  thrpt  30    83.11x       84.86x      86.126x
> IntMaxVector.SADD           1024  thrpt  30    83.11x       83.07x      83.183x
> IntMaxVector.SADDMasked     1024  thrpt  30    90.67x       91.80x      93.162x
> IntMaxVector.SSUB           1024  thrpt  30    83.37x       82.82x      83.317x
> IntMaxVector.SSUBMasked     1024  thrpt  30    90.85x       92.87x      94.201x
> IntMaxVector.SUADD          1024  thrpt  30    82.76x       81.78x      82.679x
> IntMaxVector.SUADDMasked    1024  thrpt  30    90.49x       91.93x      93.155x
> IntMaxVector.SUSUB          1024  thrpt  30    82.92x       82.34x      82.525x
> IntMaxVector.SUSUBMasked    1024  thrpt  30    90.60x       92.12x      92.951x
> IntMaxVector.UMAX           1024  thrpt  30    82.40x       81.85x      82.242x
> IntMaxVector.UMAXMasked     1024  thrpt  30    90.30x       92.10x      92.587x
> IntMaxVector.UMIN           1024  thrpt  30    82.84x       81.43x      82.801x
> IntMaxVector.UMINMasked     1024  thrpt  30    90.43x       91.49x      92.678x
> LongMaxVector.SADD          1024  thrpt  30    82.01x       81.74x      82.153x
> LongMaxVector...

Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:

  Fix IR test failure on X64 with UseAVX=1

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/23608/files
  - new: https://git.openjdk.org/jdk/pull/23608/files/9aa97fb0..30bbbde5

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=23608&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23608&range=01-02

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/23608.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23608/head:pull/23608

PR: https://git.openjdk.org/jdk/pull/23608


More information about the hotspot-compiler-dev mailing list