RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v22]
Srinivas Vamsi Parasa
duke at openjdk.org
Thu Aug 24 06:26:42 UTC 2023
On Thu, 17 Aug 2023 17:23:01 GMT, Srinivas Vamsi Parasa <duke at openjdk.org> wrote:
> Improvements are nice but it would not pay off if you have big regressions. I can accept 0.9x but 0.4x - 0.8x regressions should be investigated and implementation adjusted to avoid them.
Hello Vladimir (@vnkozlov) ,
As per your suggestion, the implementation was adjusted to address the regressions caused for STAGGER and REPEATED type of input data patterns.
Please see below the new JMH performance data using the adjusted implementation.
In the new implementation, we don't call the AVX512 sort intrinsic at the top level (`Arrays.sort()`) . Instead, we take a decomposed or a 'building blocks' approach where we only intrinsify (using AVX512 instructions) the partitioning and small array sort functions used in the current baseline ( `DualPivotQuickSort.sort()` ). Since the current baseline has logic to detect and sort special input patterns like STAGGER, we fallback to the current baseline instead of using AVX512 partitioning and sorting (which works best for RANDOM, REPEATED and SHUFFLE patterns).
Data below shows `Arrays.sort()` performance comparison between the current **Java baseline (DPQS)** vs. **AVX512 sort** (this PR) using the `ArraysSort.java` JMH [benchmark](https://github.com/openjdk/jdk/pull/13568/files#diff-dee51b13bd1872ff455cec2f29255cfd25014a5dd33dda55a2fc68638c3dd4b2) provided in the PR for [JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)](https://github.com/openjdk/jdk/pull/13568/files#top) ( #13568)
- The following command line was used to run the benchmarks: ` java -jar $JDK_HOME/build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -jvmArgs "-XX:CompileThreshold=1 -XX:-TieredCompilation" ArraysSort`
- The scores shown are the average time (us/op), thus lower is better. The last column towards the right shows the speedup.
| Benchmark | Mode | Size | Baseline DPQS (us/op) | AVX512 partitioning & sort (us/op) | Speedup |
| --- | --- | --- | --- | --- | --- |
| ArraysSort.Double.testSort | RANDOM | 800 | 6.7 | 4.8 | 1.39x |
| ArraysSort.Double.testSort | RANDOM | 7000 | 234.1 | 51.5 | **4.55x** |
| ArraysSort.Double.testSort | RANDOM | 50000 | 2155.9 | 470.0 | **4.59x** |
| ArraysSort.Double.testSort | RANDOM | 300000 | 15076.3 | 3391.3 | **4.45x** |
| ArraysSort.Double.testSort | RANDOM | 2000000 | 116445.5 | 27491.7 | **4.24x** |
| ArraysSort.Double.testSort | REPEATED | 800 | 2.3 | 1.7 | 1.35x |
| ArraysSort.Double.testSort | REPEATED | 7000 | 23.3 | 12.1 | **1.92x** |
| ArraysSort.Double.testSort | REPEATED | 50000 | 460.9 | 151.9 | **3.03x** |
| ArraysSort.Double.testSort | REPEATED | 300000 | 2935.1 | 1082.5 | **2.71x** |
| ArraysSort.Double.testSort | REPEATED | 2000000 | 19533.8 | 8158.6 | **2.39x** |
| ArraysSort.Double.testSort | SHUFFLE | 800 | 4.6 | 4.4 | 1.04x |
| ArraysSort.Double.testSort | SHUFFLE | 7000 | 86.7 | 48.7 | **1.78x** |
| ArraysSort.Double.testSort | SHUFFLE | 50000 | 839.0 | 436.4 | **1.92x** |
| ArraysSort.Double.testSort | SHUFFLE | 300000 | 5761.0 | 3036.9 | **1.90x** |
| ArraysSort.Double.testSort | SHUFFLE | 2000000 | 38044.3 | 23917.2 | 1.59x |
| ArraysSort.Double.testSort | STAGGER | 800 | 2.5 | 2.6 | 0.96x |
| ArraysSort.Double.testSort | STAGGER | 7000 | 28.6 | 22.7 | 1.26x |
| ArraysSort.Double.testSort | STAGGER | 50000 | 141.4 | 142.6 | 0.99x |
| ArraysSort.Double.testSort | STAGGER | 300000 | 1069.9 | 892.6 | 1.20x |
| ArraysSort.Double.testSort | STAGGER | 2000000 | 6693.0 | 6686.1 | 1.00x |
| ArraysSort.Float.testSort | RANDOM | 800 | 6.8 | 4.1 | 1.66x |
| ArraysSort.Float.testSort | RANDOM | 7000 | 233.0 | 38.7 | **6.02x** |
| ArraysSort.Float.testSort | RANDOM | 50000 | 2202.2 | 353.9 | **6.22x** |
| ArraysSort.Float.testSort | RANDOM | 300000 | 15084.3 | 2374.9 | **6.35x** |
| ArraysSort.Float.testSort | RANDOM | 2000000 | 117961.2 | 17431.5 | **6.77x** |
| ArraysSort.Float.testSort | REPEATED | 800 | 2.3 | 1.5 | 1.53x |
| ArraysSort.Float.testSort | REPEATED | 7000 | 28.2 | 8.7 | **3.24x** |
| ArraysSort.Float.testSort | REPEATED | 50000 | 467.5 | 118.4 | **3.95x** |
| ArraysSort.Float.testSort | REPEATED | 300000 | 2976.0 | 974.8 | **3.05x** |
| ArraysSort.Float.testSort | REPEATED | 2000000 | 18910.0 | 6574.2 | **2.88x** |
| ArraysSort.Float.testSort | SHUFFLE | 800 | 4.6 | 3.5 | 1.30x |
| ArraysSort.Float.testSort | SHUFFLE | 7000 | 84.2 | 36.3 | **2.32x** |
| ArraysSort.Float.testSort | SHUFFLE | 50000 | 838.9 | 323.9 | **2.59x** |
| ArraysSort.Float.testSort | SHUFFLE | 300000 | 5704.2 | 2246.8 | **2.54x** |
| ArraysSort.Float.testSort | SHUFFLE | 2000000 | 37341.8 | 16043.9 | **2.33x** |
| ArraysSort.Float.testSort | STAGGER | 800 | 2.2 | 2.3 | 0.99x |
| ArraysSort.Float.testSort | STAGGER | 7000 | 20.5 | 20.9 | 0.98x |
| ArraysSort.Float.testSort | STAGGER | 50000 | 132.1 | 130.4 | 1.01x |
| ArraysSort.Float.testSort | STAGGER | 300000 | 802.9 | 836.3 | 0.96x |
| ArraysSort.Float.testSort | STAGGER | 2000000 | 5584.2 | 5587.9 | 1.00x |
| ArraysSort.Int.testSort | RANDOM | 800 | 6.2 | 3.4 | 1.84x |
| ArraysSort.Int.testSort | RANDOM | 7000 | 210.0 | 31.9 | **6.59x** |
| ArraysSort.Int.testSort | RANDOM | 50000 | 2068.5 | 297.9 | **6.94x** |
| ArraysSort.Int.testSort | RANDOM | 300000 | 14058.4 | 2104.9 | **6.68x** |
| ArraysSort.Int.testSort | RANDOM | 2000000 | 114645.1 | 15266.0 | **7.51x** |
| ArraysSort.Int.testSort | REPEATED | 800 | 1.6 | 0.9 | 1.76x |
| ArraysSort.Int.testSort | REPEATED | 7000 | 25.2 | 3.5 | **7.15x** |
| ArraysSort.Int.testSort | REPEATED | 50000 | 332.4 | 26.8 | **12.39x** |
| ArraysSort.Int.testSort | REPEATED | 300000 | 2012.2 | 147.5 | **13.64x** |
| ArraysSort.Int.testSort | REPEATED | 2000000 | 11870.5 | 1099.9 | **10.79x** |
| ArraysSort.Int.testSort | SHUFFLE | 800 | 4.4 | 2.9 | 1.53x |
| ArraysSort.Int.testSort | SHUFFLE | 7000 | 79.5 | 30.4 | **2.61x** |
| ArraysSort.Int.testSort | SHUFFLE | 50000 | 771.2 | 275.6 | **2.80x** |
| ArraysSort.Int.testSort | SHUFFLE | 300000 | 5140.2 | 1995.7 | **2.58x** |
| ArraysSort.Int.testSort | SHUFFLE | 2000000 | 34605.7 | 14190.4 | **2.44x** |
| ArraysSort.Int.testSort | STAGGER | 800 | 1.7 | 1.7 | 0.99x |
| ArraysSort.Int.testSort | STAGGER | 7000 | 15.8 | 15.9 | 1.00x |
| ArraysSort.Int.testSort | STAGGER | 50000 | 97.3 | 96.9 | 1.00x |
| ArraysSort.Int.testSort | STAGGER | 300000 | 588.9 | 596.9 | 0.99x |
| ArraysSort.Int.testSort | STAGGER | 2000000 | 3940.4 | 4006.4 | 0.98x |
| ArraysSort.Long.testSort | RANDOM | 800 | 6.4 | 4.9 | 1.30x |
| ArraysSort.Long.testSort | RANDOM | 7000 | 205.4 | 53.3 | **3.85x** |
| ArraysSort.Long.testSort | RANDOM | 50000 | 2015.6 | 483.1 | **4.17x** |
| ArraysSort.Long.testSort | RANDOM | 300000 | 14100.0 | 3485.8 | **4.04x** |
| ArraysSort.Long.testSort | RANDOM | 2000000 | 108740.4 | 27978.6 | **3.89x** |
| ArraysSort.Long.testSort | REPEATED | 800 | 1.6 | 1.2 | 1.33x |
| ArraysSort.Long.testSort | REPEATED | 7000 | 15.9 | 7.4 | **2.13x** |
| ArraysSort.Long.testSort | REPEATED | 50000 | 287.8 | 44.9 | **6.41x** |
| ArraysSort.Long.testSort | REPEATED | 300000 | 1969.6 | 287.2 | **6.86x** |
| ArraysSort.Long.testSort | REPEATED | 2000000 | 12095.4 | 2821.4 | **4.29x** |
| ArraysSort.Long.testSort | SHUFFLE | 800 | 4.2 | 4.5 | 0.95x |
| ArraysSort.Long.testSort | SHUFFLE | 7000 | 82.7 | 51.2 | 1.62x |
| ArraysSort.Long.testSort | SHUFFLE | 50000 | 757.1 | 443.3 | 1.71x |
| ArraysSort.Long.testSort | SHUFFLE | 300000 | 5196.6 | 3140.8 | 1.65x |
| ArraysSort.Long.testSort | SHUFFLE | 2000000 | 34367.7 | 24646.1 | 1.39x |
| ArraysSort.Long.testSort | STAGGER | 800 | 1.9 | 2.0 | 0.98x |
| ArraysSort.Long.testSort | STAGGER | 7000 | 20.4 | 21.2 | 0.96x |
| ArraysSort.Long.testSort | STAGGER | 50000 | 111.0 | 109.3 | 1.02x |
| ArraysSort.Long.testSort | STAGGER | 300000 | 681.4 | 699.8 | 0.97x |
| ArraysSort.Long.testSort | STAGGER | 2000000 | 5127.1 | 5040.3 | 1.02x |
Thanks,
Vamsi
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1691073640
More information about the hotspot-compiler-dev
mailing list