[vectorIntrinsics] RFR: 8343689: AArch64: Optimize MulReduction implementation

Tue Jan 14 17:26:40 UTC 2025

Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used.

Benchmarks results for an AArch64 CPU with support for SVE with 256-bit vector length:

  Benchmark                 (size)   Mode      Old        New  Units
  Byte256Vector.MULLanes      1024  thrpt  502.498  10222.717 ops/ms
  Double256Vector.MULLanes    1024  thrpt  172.116   3130.997 ops/ms
  Float256Vector.MULLanes     1024  thrpt  291.612   4164.138 ops/ms
  Int256Vector.MULLanes       1024  thrpt  362.276   3717.213 ops/ms
  Long256Vector.MULLanes      1024  thrpt  184.826   2054.345 ops/ms
  Short256Vector.MULLanes     1024  thrpt  379.231   5716.223 ops/ms

Benchmarks results for an AArch64 CPU with support for SVE with 512-bit vector length:

  Benchmark                 (size)   Mode      Old       New   Units
  Byte512Vector.MULLanes      1024  thrpt  160.129  2630.600  ops/ms
  Double512Vector.MULLanes    1024  thrpt   51.229  1033.284  ops/ms
  Float512Vector.MULLanes     1024  thrpt   84.617  1658.400  ops/ms
  Int512Vector.MULLanes       1024  thrpt  109.419  1180.310  ops/ms
  Long512Vector.MULLanes      1024  thrpt   69.036   704.144  ops/ms
  Short512Vector.MULLanes     1024  thrpt  131.029  1629.632  ops/ms

-------------

Commit messages:
 - 8343689: AArch64: Optimize MulReduction implementation

Changes: https://git.openjdk.org/panama-vector/pull/225/files
  Webrev: https://webrevs.openjdk.org/?repo=panama-vector&pr=225&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8343689
  Stats: 332 lines in 7 files changed: 222 ins; 12 del; 98 mod
  Patch: https://git.openjdk.org/panama-vector/pull/225.diff
  Fetch: git fetch https://git.openjdk.org/panama-vector.git pull/225/head:pull/225

PR: https://git.openjdk.org/panama-vector/pull/225