RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14]
Srinivas Vamsi Parasa
duke at openjdk.org
Wed Aug 16 18:25:13 UTC 2023
On Tue, 15 Aug 2023 19:23:24 GMT, iaroslavski <duke at openjdk.org> wrote:
>>> @vamsi-parasa We need to preserve NaNs. The base (https://github.com/intel/x86-simd-sort) algorithm used doesn't preserve NaNs.
>>
>> Thanks for catching this Sandhya! This is fixed now in the most recent commit. A preprocessing step is added to move the NaNs to the top of the array.
>
> Hello @vamsi-parasa !
>
> Do you process negative zeros properly? From one hand -0.0f equals to 0.0f, but negative zeros must be placed before 0.0f.
> See javadoc for Arrays.sort(float[] a). The same situation with -0.0d (double type).
@iaroslavski
Hello Vladimir,
Please see the `Arrays.sort()` performance comparison between the **enhanced DPQS/Radix sort #13568** vs. **AVX512 sort intrinsic** (this PR) using the `ArraysSort.java` JMH [benchmark](https://github.com/openjdk/jdk/pull/13568/files#diff-dee51b13bd1872ff455cec2f29255cfd25014a5dd33dda55a2fc68638c3dd4b2) provided in the PR for [JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)](https://github.com/openjdk/jdk/pull/13568/files#top) ( #13568)
- The following command line was used to run the benchmarks: ` java -jar $JDK_HOME/build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -jvmArgs "-XX:CompileThreshold=1 -XX:-TieredCompilation" ArraysSort`
- Please see the performance numbers below. The scores shown are the average time (us/op), thus lower is better. The last column towards the right shows the speedup.
- Space complexity was not benchmarked as AVX512 sort is an in-place sorting algorithm while the Radix sort needs extra memory.
| Benchmark | Mode | Size | Enhanced DPQS/Radix (us/op) | AVX512 Sort (us/op) | Speedup |
| --- | --- | --- | --- | --- | --- |
| ArraysSortTests.Double.testSort | RANDOM | 800 | 8.4 | 2.6 | 3.3x |
| ArraysSortTests.Double.testSort | RANDOM | 7000 | 94.4 | 28.0 | 3.4x |
| ArraysSortTests.Double.testSort | RANDOM | 50000 | 634.9 | 238.4 | 2.7x |
| ArraysSortTests.Double.testSort | RANDOM | 300000 | 3887.6 | 1721.6 | 2.3x |
| ArraysSortTests.Double.testSort | RANDOM | 2000000 | 29348.9 | 13854.8 | 2.1x |
| ArraysSortTests.Double.testSort | REPEATED | 800 | 2.0 | 1.6 | 1.2x |
| ArraysSortTests.Double.testSort | REPEATED | 7000 | 39.6 | 37.2 | 1.1x |
| ArraysSortTests.Double.testSort | REPEATED | 50000 | 470.0 | 386.3 | 1.2x |
| ArraysSortTests.Double.testSort | REPEATED | 300000 | 2940.8 | 342.7 | 8.6x |
| ArraysSortTests.Double.testSort | REPEATED | 2000000 | 19361.4 | 2838.9 | 6.8x |
| ArraysSortTests.Double.testSort | STAGGER | 800 | 2.3 | 2.7 | 0.8x |
| ArraysSortTests.Double.testSort | STAGGER | 7000 | 25.6 | 28.0 | 0.9x |
| ArraysSortTests.Double.testSort | STAGGER | 50000 | 159.8 | 224.0 | 0.7x |
| ArraysSortTests.Double.testSort | STAGGER | 300000 | 945.3 | 1566.8 | 0.6x |
| ArraysSortTests.Double.testSort | STAGGER | 2000000 | 6398.7 | 12668.3 | 0.5x |
| ArraysSortTests.Double.testSort | SHUFFLE | 800 | 4.9 | 2.7 | 1.8x |
| ArraysSortTests.Double.testSort | SHUFFLE | 7000 | 55.9 | 27.5 | 2.0x |
| ArraysSortTests.Double.testSort | SHUFFLE | 50000 | 419.8 | 233.5 | 1.8x |
| ArraysSortTests.Double.testSort | SHUFFLE | 300000 | 2636.2 | 1569.6 | 1.7x |
| ArraysSortTests.Double.testSort | SHUFFLE | 2000000 | 21131.7 | 12963.4 | 1.6x |
| ArraysSortTests.Float.testSort | RANDOM | 800 | 7.4 | 1.8 | 4.1x |
| ArraysSortTests.Float.testSort | RANDOM | 7000 | 46.1 | 18.8 | 2.4x |
| ArraysSortTests.Float.testSort | RANDOM | 50000 | 328.5 | 177.2 | 1.9x |
| ArraysSortTests.Float.testSort | RANDOM | 300000 | 1960.5 | 1342.3 | 1.5x |
| ArraysSortTests.Float.testSort | RANDOM | 2000000 | 14502.4 | 10087.5 | 1.4x |
| ArraysSortTests.Float.testSort | REPEATED | 800 | 2.0 | 0.8 | 2.4x |
| ArraysSortTests.Float.testSort | REPEATED | 7000 | 30.6 | 6.6 | 4.6x |
| ArraysSortTests.Float.testSort | REPEATED | 50000 | 369.0 | 34.9 | 10.6x |
| ArraysSortTests.Float.testSort | REPEATED | 300000 | 2937.7 | 207.1 | 14.2x |
| ArraysSortTests.Float.testSort | REPEATED | 2000000 | 19008.3 | 1503.9 | 12.6x |
| ArraysSortTests.Float.testSort | STAGGER | 800 | 2.2 | 1.7 | 1.3x |
| ArraysSortTests.Float.testSort | STAGGER | 7000 | 25.7 | 19.0 | 1.4x |
| ArraysSortTests.Float.testSort | STAGGER | 50000 | 151.2 | 153.6 | 1.0x |
| ArraysSortTests.Float.testSort | STAGGER | 300000 | 873.0 | 1165.8 | 0.7x |
| ArraysSortTests.Float.testSort | STAGGER | 2000000 | 5720.0 | 8600.0 | 0.7x |
| ArraysSortTests.Float.testSort | SHUFFLE | 800 | 4.9 | 1.7 | 2.9x |
| ArraysSortTests.Float.testSort | SHUFFLE | 7000 | 41.6 | 18.5 | 2.3x |
| ArraysSortTests.Float.testSort | SHUFFLE | 50000 | 362.9 | 156.4 | 2.3x |
| ArraysSortTests.Float.testSort | SHUFFLE | 300000 | 2067.8 | 1204.3 | 1.7x |
| ArraysSortTests.Float.testSort | SHUFFLE | 2000000 | 14591.3 | 9040.1 | 1.6x |
| ArraysSortTests.Int.testSort | RANDOM | 800 | 7.0 | 1.3 | 5.4x |
| ArraysSortTests.Int.testSort | RANDOM | 7000 | 30.4 | 14.2 | 2.1x |
| ArraysSortTests.Int.testSort | RANDOM | 50000 | 255.2 | 153.4 | 1.7x |
| ArraysSortTests.Int.testSort | RANDOM | 300000 | 1618.5 | 1139.2 | 1.4x |
| ArraysSortTests.Int.testSort | RANDOM | 2000000 | 11557.7 | 8509.0 | 1.4x |
| ArraysSortTests.Int.testSort | REPEATED | 800 | 1.2 | 0.5 | 2.5x |
| ArraysSortTests.Int.testSort | REPEATED | 7000 | 10.6 | 3.9 | 2.7x |
| ArraysSortTests.Int.testSort | REPEATED | 50000 | 192.7 | 20.8 | 9.3x |
| ArraysSortTests.Int.testSort | REPEATED | 300000 | 1952.6 | 138.8 | 14.1x |
| ArraysSortTests.Int.testSort | REPEATED | 2000000 | 12969.4 | 732.2 | 17.7x |
| ArraysSortTests.Int.testSort | STAGGER | 800 | 1.5 | 1.2 | 1.3x |
| ArraysSortTests.Int.testSort | STAGGER | 7000 | 13.2 | 14.2 | 0.9x |
| ArraysSortTests.Int.testSort | STAGGER | 50000 | 94.1 | 116.7 | 0.8x |
| ArraysSortTests.Int.testSort | STAGGER | 300000 | 620.0 | 949.2 | 0.7x |
| ArraysSortTests.Int.testSort | STAGGER | 2000000 | 4279.9 | 7195.2 | 0.6x |
| ArraysSortTests.Int.testSort | SHUFFLE | 800 | 4.6 | 1.2 | 3.7x |
| ArraysSortTests.Int.testSort | SHUFFLE | 7000 | 33.2 | 13.9 | 2.4x |
| ArraysSortTests.Int.testSort | SHUFFLE | 50000 | 385.1 | 138.3 | 2.8x |
| ArraysSortTests.Int.testSort | SHUFFLE | 300000 | 2031.9 | 1030.3 | 2.0x |
| ArraysSortTests.Int.testSort | SHUFFLE | 2000000 | 20460.8 | 7631.4 | 2.7x |
| ArraysSortTests.Long.testSort | RANDOM | 800 | 6.8 | 3.0 | 2.3x |
| ArraysSortTests.Long.testSort | RANDOM | 7000 | 83.6 | 33.2 | 2.5x |
| ArraysSortTests.Long.testSort | RANDOM | 50000 | 616.0 | 281.7 | 2.2x |
| ArraysSortTests.Long.testSort | RANDOM | 300000 | 3752.6 | 1988.4 | 1.9x |
| ArraysSortTests.Long.testSort | RANDOM | 2000000 | 28236.2 | 15483.1 | 1.8x |
| ArraysSortTests.Long.testSort | REPEATED | 800 | 1.3 | 1.7 | 0.8x |
| ArraysSortTests.Long.testSort | REPEATED | 7000 | 19.5 | 35.1 | 0.6x |
| ArraysSortTests.Long.testSort | REPEATED | 50000 | 309.8 | 390.3 | 0.8x |
| ArraysSortTests.Long.testSort | REPEATED | 300000 | 2046.2 | 253.0 | 8.1x |
| ArraysSortTests.Long.testSort | REPEATED | 2000000 | 13105.2 | 1985.3 | 6.6x |
| ArraysSortTests.Long.testSort | STAGGER | 800 | 1.6 | 3.2 | 0.5x |
| ArraysSortTests.Long.testSort | STAGGER | 7000 | 13.3 | 32.8 | 0.4x |
| ArraysSortTests.Long.testSort | STAGGER | 50000 | 116.3 | 265.0 | 0.4x |
| ArraysSortTests.Long.testSort | STAGGER | 300000 | 684.3 | 1827.9 | 0.4x |
| ArraysSortTests.Long.testSort | STAGGER | 2000000 | 5022.3 | 14350.6 | 0.3x |
| ArraysSortTests.Long.testSort | SHUFFLE | 800 | 4.4 | 3.3 | 1.3x |
| ArraysSortTests.Long.testSort | SHUFFLE | 7000 | 39.9 | 32.6 | 1.2x |
| ArraysSortTests.Long.testSort | SHUFFLE | 50000 | 614.9 | 274.2 | 2.2x |
| ArraysSortTests.Long.testSort | SHUFFLE | 300000 | 2546.9 | 1999.7 | 1.3x |
| ArraysSortTests.Long.testSort | SHUFFLE | 2000000 | 31300.8 | 14655.9 | 2.1x |
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1681080351
More information about the hotspot-compiler-dev
mailing list