RFR: 8371711: AArch64: SVE intrinsics for Arrays.sort methods (int, float)
Bhavana Kilambi
bkilambi at openjdk.org
Fri Dec 5 14:45:10 UTC 2025
This patch adds an SVE implementation of primitive array sorting (Arrays.sort()) on AArch64 systems that support SVE. On non-SVE machines, we fall back to the existing Java implementation.
For smaller arrays (length <= 64), we use insertion sort; for larger arrays we use an SVE-vectorized quicksort partitioner followed by an odd-even transposition cleanup pass.
The SVE path is enabled by default for int type. For float type, it is available through the experimental flag :
`-XX:+UnlockExperimentalVMOptions -XX:+UseSVELibSimdSortForFP
`
Without this flag being enabled, the default Java implementation would be executed for floats (the flag is disabled by default).
Float is gated due to observed regressions on some small/medium sizes. On larger arrays, the SVE float path shows upto 1.47x speedup on Neoverse V2 and 2.12x on Neoverse V1.
Following are the performance numbers for **ArraysSort JMH benchmark** -
**Case A:** Ratio between the scores of master branch and `UseSVELibSimdSortForFP` flag disabled (which is the default).
**Case B:** Ratio between the scores of master branch and `UseSVELibSimdSortForFP` flag enabled (the int numbers will be the same but this now enables SVE vectorized sorting for floats).
**We would want the ratios to be >= 1 to be at par or better than the default Java implementation (master branch).**
On Neoverse V1:
Benchmark (size) Mode Cnt A B
ArraysSort.floatParallelSort 10 avgt 3 0.98 0.98
ArraysSort.floatParallelSort 25 avgt 3 1.01 0.83
ArraysSort.floatParallelSort 50 avgt 3 0.99 0.55
ArraysSort.floatParallelSort 75 avgt 3 0.99 0.66
ArraysSort.floatParallelSort 100 avgt 3 0.98 0.66
ArraysSort.floatParallelSort 1000 avgt 3 1.00 0.84
ArraysSort.floatParallelSort 10000 avgt 3 1.03 1.52
ArraysSort.floatParallelSort 100000 avgt 3 1.03 1.46
ArraysSort.floatParallelSort 1000000 avgt 3 0.98 1.81
ArraysSort.floatSort 10 avgt 3 1.00 0.98
ArraysSort.floatSort 25 avgt 3 1.00 0.81
ArraysSort.floatSort 50 avgt 3 0.99 0.56
ArraysSort.floatSort 75 avgt 3 0.99 0.65
ArraysSort.floatSort 100 avgt 3 0.98 0.70
ArraysSort.floatSort 1000 avgt 3 0.99 0.84
ArraysSort.floatSort 10000 avgt 3 0.99 1.72
ArraysSort.floatSort 100000 avgt 3 1.00 1.94
ArraysSort.floatSort 1000000 avgt 3 1.00 2.13
ArraysSort.intParallelSort 10 avgt 3 1.08 1.08
ArraysSort.intParallelSort 25 avgt 3 1.04 1.05
ArraysSort.intParallelSort 50 avgt 3 1.29 1.30
ArraysSort.intParallelSort 75 avgt 3 1.16 1.16
ArraysSort.intParallelSort 100 avgt 3 1.07 1.07
ArraysSort.intParallelSort 1000 avgt 3 1.13 1.13
ArraysSort.intParallelSort 10000 avgt 3 1.49 1.38
ArraysSort.intParallelSort 100000 avgt 3 1.64 1.62
ArraysSort.intParallelSort 1000000 avgt 3 2.26 2.27
ArraysSort.intSort 10 avgt 3 1.08 1.08
ArraysSort.intSort 25 avgt 3 1.02 1.02
ArraysSort.intSort 50 avgt 3 1.25 1.25
ArraysSort.intSort 75 avgt 3 1.16 1.20
ArraysSort.intSort 100 avgt 3 1.07 1.07
ArraysSort.intSort 1000 avgt 3 1.12 1.13
ArraysSort.intSort 10000 avgt 3 1.94 1.95
ArraysSort.intSort 100000 avgt 3 1.86 1.86
ArraysSort.intSort 1000000 avgt 3 2.09 2.09
On Neoverse V2:
Benchmark (size) Mode Cnt A B
ArraysSort.floatParallelSort 10 avgt 3 1.02 1.02
ArraysSort.floatParallelSort 25 avgt 3 0.97 0.71
ArraysSort.floatParallelSort 50 avgt 3 0.94 0.65
ArraysSort.floatParallelSort 75 avgt 3 0.96 0.82
ArraysSort.floatParallelSort 100 avgt 3 0.95 0.84
ArraysSort.floatParallelSort 1000 avgt 3 1.01 0.94
ArraysSort.floatParallelSort 10000 avgt 3 1.01 1.25
ArraysSort.floatParallelSort 100000 avgt 3 1.01 1.09
ArraysSort.floatParallelSort 1000000 avgt 3 1.00 1.10
ArraysSort.floatSort 10 avgt 3 1.02 1.00
ArraysSort.floatSort 25 avgt 3 0.99 0.76
ArraysSort.floatSort 50 avgt 3 0.97 0.66
ArraysSort.floatSort 75 avgt 3 1.01 0.83
ArraysSort.floatSort 100 avgt 3 1.00 0.85
ArraysSort.floatSort 1000 avgt 3 0.99 0.93
ArraysSort.floatSort 10000 avgt 3 1.00 1.28
ArraysSort.floatSort 100000 avgt 3 1.00 1.37
ArraysSort.floatSort 1000000 avgt 3 1.00 1.48
ArraysSort.intParallelSort 10 avgt 3 1.05 1.05
ArraysSort.intParallelSort 25 avgt 3 0.99 0.84
ArraysSort.intParallelSort 50 avgt 3 1.03 1.14
ArraysSort.intParallelSort 75 avgt 3 0.91 0.99
ArraysSort.intParallelSort 100 avgt 3 0.98 0.96
ArraysSort.intParallelSort 1000 avgt 3 1.32 1.30
ArraysSort.intParallelSort 10000 avgt 3 1.40 1.40
ArraysSort.intParallelSort 100000 avgt 3 1.00 1.04
ArraysSort.intParallelSort 1000000 avgt 3 1.15 1.14
ArraysSort.intSort 10 avgt 3 1.05 1.05
ArraysSort.intSort 25 avgt 3 1.03 1.03
ArraysSort.intSort 50 avgt 3 1.08 1.14
ArraysSort.intSort 75 avgt 3 0.88 0.98
ArraysSort.intSort 100 avgt 3 1.01 0.99
ArraysSort.intSort 1000 avgt 3 1.3 1.32
ArraysSort.intSort 10000 avgt 3 1.43 1.43
ArraysSort.intSort 100000 avgt 3 1.30 1.30
ArraysSort.intSort 1000000 avgt 3 1.37 1.37
-------------
Commit messages:
- AArch64 SVE implementation for Arrays.sort
- Prepare base for SVE implementation in libsimdsort
Changes: https://git.openjdk.org/jdk/pull/28675/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28675&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8371711
Stats: 1109 lines in 26 files changed: 1092 ins; 12 del; 5 mod
Patch: https://git.openjdk.org/jdk/pull/28675.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/28675/head:pull/28675
PR: https://git.openjdk.org/jdk/pull/28675
More information about the hotspot-dev
mailing list