RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations

Xiaohong Gong xgong at openjdk.org
Thu Feb 13 01:54:49 UTC 2025


Since PR [1] has added several new vector operations in VectorAPI and the X86 backend implementation for them, this patch adds the AArch64 backend part for NEON/SVE architectures.

The performance of Vector API relative JMH micro benchmarks can improve about 70x ~ 95x on an AArch64 128-bit vector length sve2 architecture with different UseSVE options. Here is the gain details:


Benchmark                  (size)  Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2
ByteMaxVector.SADD          1024  thrpt  30    80.69x       79.70x      80.534x
ByteMaxVector.SADDMasked    1024  thrpt  30    84.08x       85.72x      85.901x
ByteMaxVector.SSUB          1024  thrpt  30    80.46x       80.27x      81.063x
ByteMaxVector.SSUBMasked    1024  thrpt  30    83.96x       85.26x      85.887x
ByteMaxVector.SUADD         1024  thrpt  30    80.43x       80.36x      81.761x
ByteMaxVector.SUADDMasked   1024  thrpt  30    83.40x       84.62x      85.199x
ByteMaxVector.SUSUB         1024  thrpt  30    79.93x       79.22x      79.714x
ByteMaxVector.SUSUBMasked   1024  thrpt  30    82.93x       85.02x      84.726x
ByteMaxVector.UMAX          1024  thrpt  30    78.73x       77.39x      78.220x
ByteMaxVector.UMAXMasked    1024  thrpt  30    82.62x       84.77x      85.531x
ByteMaxVector.UMIN          1024  thrpt  30    79.04x       77.80x      78.471x
ByteMaxVector.UMINMasked    1024  thrpt  30    83.11x       84.86x      86.126x
IntMaxVector.SADD           1024  thrpt  30    83.11x       83.07x      83.183x
IntMaxVector.SADDMasked     1024  thrpt  30    90.67x       91.80x      93.162x
IntMaxVector.SSUB           1024  thrpt  30    83.37x       82.82x      83.317x
IntMaxVector.SSUBMasked     1024  thrpt  30    90.85x       92.87x      94.201x
IntMaxVector.SUADD          1024  thrpt  30    82.76x       81.78x      82.679x
IntMaxVector.SUADDMasked    1024  thrpt  30    90.49x       91.93x      93.155x
IntMaxVector.SUSUB          1024  thrpt  30    82.92x       82.34x      82.525x
IntMaxVector.SUSUBMasked    1024  thrpt  30    90.60x       92.12x      92.951x
IntMaxVector.UMAX           1024  thrpt  30    82.40x       81.85x      82.242x
IntMaxVector.UMAXMasked     1024  thrpt  30    90.30x       92.10x      92.587x
IntMaxVector.UMIN           1024  thrpt  30    82.84x       81.43x      82.801x
IntMaxVector.UMINMasked     1024  thrpt  30    90.43x       91.49x      92.678x
LongMaxVector.SADD          1024  thrpt  30    82.01x       81.74x      82.153x
LongMaxVector.SADDMasked    1024  thrpt  30    91.61x       92.69x      93.579x
LongMaxVector.SSUB          1024  thrpt  30    81.97x       81.42x      82.991x
LongMaxVector.SSUBMasked    1024  thrpt  30    91.34x       92.47x      93.026x
LongMaxVector.SUADD         1024  thrpt  30    82.44x       81.29x      82.506x
LongMaxVector.SUADDMasked   1024  thrpt  30    92.21x       92.35x      93.419x
LongMaxVector.SUSUB         1024  thrpt  30    82.04x       80.98x      81.761x
LongMaxVector.SUSUBMasked   1024  thrpt  30    91.74x       92.39x      93.375x
LongMaxVector.UMAX          1024  thrpt  30    81.59x       80.21x      82.162x
LongMaxVector.UMAXMasked    1024  thrpt  30    70.09x       92.89x      93.627x
LongMaxVector.UMIN          1024  thrpt  30    82.31x       81.95x      82.298x
LongMaxVector.UMINMasked    1024  thrpt  30    69.85x       92.19x      93.390x
ShortMaxVector.SADD         1024  thrpt  30    80.08x       79.15x      80.310x
ShortMaxVector.SADDMasked   1024  thrpt  30    90.74x       92.00x      93.743x
ShortMaxVector.SSUB         1024  thrpt  30    79.54x       78.67x      80.584x
ShortMaxVector.SSUBMasked   1024  thrpt  30    91.18x       92.10x      93.725x
ShortMaxVector.SUADD        1024  thrpt  30    79.86x       79.37x      80.372x
ShortMaxVector.SUADDMasked  1024  thrpt  30    90.17x       92.43x      93.759x
ShortMaxVector.SUSUB        1024  thrpt  30    79.78x       79.85x      80.744x
ShortMaxVector.SUSUBMasked  1024  thrpt  30    89.99x       91.91x      93.320x
ShortMaxVector.UMAX         1024  thrpt  30    79.87x       79.81x      80.518x
ShortMaxVector.UMAXMasked   1024  thrpt  30    89.69x       91.70x      92.826x
ShortMaxVector.UMIN         1024  thrpt  30    79.11x       77.98x      79.458x
ShortMaxVector.UMINMasked   1024  thrpt  30    90.49x       92.86x      93.323x


Tested with `hotspot::hotspot_all` and `jdk::jdk_all`, and no new regression is found.

[1] https://github.com/openjdk/jdk/pull/20507

-------------

Commit messages:
 - 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations

Changes: https://git.openjdk.org/jdk/pull/23608/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23608&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8349522
  Stats: 1137 lines in 8 files changed: 673 ins; 3 del; 461 mod
  Patch: https://git.openjdk.org/jdk/pull/23608.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23608/head:pull/23608

PR: https://git.openjdk.org/jdk/pull/23608


More information about the hotspot-compiler-dev mailing list