[vectorIntrinsics] RFR: 8343689: AArch64: Optimize MulReduction implementation
Mikhail Ablakatov
mablakatov at openjdk.org
Tue Jan 14 17:26:40 UTC 2025
Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used.
Benchmarks results for an AArch64 CPU with support for SVE with 256-bit vector length:
Benchmark (size) Mode Old New Units
Byte256Vector.MULLanes 1024 thrpt 502.498 10222.717 ops/ms
Double256Vector.MULLanes 1024 thrpt 172.116 3130.997 ops/ms
Float256Vector.MULLanes 1024 thrpt 291.612 4164.138 ops/ms
Int256Vector.MULLanes 1024 thrpt 362.276 3717.213 ops/ms
Long256Vector.MULLanes 1024 thrpt 184.826 2054.345 ops/ms
Short256Vector.MULLanes 1024 thrpt 379.231 5716.223 ops/ms
Benchmarks results for an AArch64 CPU with support for SVE with 512-bit vector length:
Benchmark (size) Mode Old New Units
Byte512Vector.MULLanes 1024 thrpt 160.129 2630.600 ops/ms
Double512Vector.MULLanes 1024 thrpt 51.229 1033.284 ops/ms
Float512Vector.MULLanes 1024 thrpt 84.617 1658.400 ops/ms
Int512Vector.MULLanes 1024 thrpt 109.419 1180.310 ops/ms
Long512Vector.MULLanes 1024 thrpt 69.036 704.144 ops/ms
Short512Vector.MULLanes 1024 thrpt 131.029 1629.632 ops/ms
-------------
Commit messages:
- 8343689: AArch64: Optimize MulReduction implementation
Changes: https://git.openjdk.org/panama-vector/pull/225/files
Webrev: https://webrevs.openjdk.org/?repo=panama-vector&pr=225&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8343689
Stats: 332 lines in 7 files changed: 222 ins; 12 del; 98 mod
Patch: https://git.openjdk.org/panama-vector/pull/225.diff
Fetch: git fetch https://git.openjdk.org/panama-vector.git pull/225/head:pull/225
PR: https://git.openjdk.org/panama-vector/pull/225
More information about the panama-dev
mailing list