RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14]
Srinivas Vamsi Parasa
duke at openjdk.org
Wed Aug 16 17:50:16 UTC 2023
On Tue, 15 Aug 2023 19:23:24 GMT, iaroslavski <duke at openjdk.org> wrote:
>>> @vamsi-parasa We need to preserve NaNs. The base (https://github.com/intel/x86-simd-sort) algorithm used doesn't preserve NaNs.
>>
>> Thanks for catching this Sandhya! This is fixed now in the most recent commit. A preprocessing step is added to move the NaNs to the top of the array.
>
> Hello @vamsi-parasa !
>
> Do you process negative zeros properly? From one hand -0.0f equals to 0.0f, but negative zeros must be placed before 0.0f.
> See javadoc for Arrays.sort(float[] a). The same situation with -0.0d (double type).
@iaroslavski
Hello Vladimir,
Please see the `Arrays.sort()` performance comparison between the current **Java baseline (DPQS)** vs. **AVX512 sort intrinsic** (this PR) using the `ArraysSort.java` JMH [benchmark](https://github.com/openjdk/jdk/pull/13568/files#diff-dee51b13bd1872ff455cec2f29255cfd25014a5dd33dda55a2fc68638c3dd4b2) provided in the PR for [JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)](https://github.com/openjdk/jdk/pull/13568/files#top) ( #13568)
- The following command line was used to run the benchmarks: ` java -jar $JDK_HOME/build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -jvmArgs "-XX:CompileThreshold=1 -XX:-TieredCompilation" ArraysSort`
- Please see the performance numbers below. The scores shown are the average time (us/op), thus lower is better. The last column towards the right shows the speedup.
- For the majority of the cases, it can be seen that AVX512 sort gives good speedup.
| Benchmark | Mode | Size | Baseline DPQS (us/op) | AVX512 Sort (us/op) | Spedup |
| --- | --- | --- | --- | --- | --- |
| ArraysSortTests.Double.testSort | RANDOM | 800 | 7.8 | 2.6 | 3.0 |
| ArraysSortTests.Double.testSort | RANDOM | 7000 | 238.3 | 28.0 | 8.5 |
| ArraysSortTests.Double.testSort | RANDOM | 50000 | 2217.9 | 238.4 | 9.3 |
| ArraysSortTests.Double.testSort | RANDOM | 300000 | 15096.9 | 1721.6 | 8.8 |
| ArraysSortTests.Double.testSort | RANDOM | 2000000 | 117451.3 | 13854.8 | 8.5 |
| ArraysSortTests.Double.testSort | REPEATED | 800 | 2.3 | 1.6 | 1.4 |
| ArraysSortTests.Double.testSort | REPEATED | 7000 | 42.7 | 37.2 | 1.1 |
| ArraysSortTests.Double.testSort | REPEATED | 50000 | 458.4 | 386.3 | 1.2 |
| ArraysSortTests.Double.testSort | REPEATED | 300000 | 2933.8 | 342.7 | 8.6 |
| ArraysSortTests.Double.testSort | REPEATED | 2000000 | 18759.5 | 2838.9 | 6.6 |
| ArraysSortTests.Double.testSort | STAGGER | 800 | 2.5 | 2.7 | 0.9 |
| ArraysSortTests.Double.testSort | STAGGER | 7000 | 28.2 | 28.0 | 1.0 |
| ArraysSortTests.Double.testSort | STAGGER | 50000 | 147.2 | 224.0 | 0.7 |
| ArraysSortTests.Double.testSort | STAGGER | 300000 | 1030.6 | 1566.8 | 0.7 |
| ArraysSortTests.Double.testSort | STAGGER | 2000000 | 8498.2 | 12668.3 | 0.7 |
| ArraysSortTests.Double.testSort | SHUFFLE | 800 | 4.7 | 2.7 | 1.7 |
| ArraysSortTests.Double.testSort | SHUFFLE | 7000 | 86.7 | 27.5 | 3.2 |
| ArraysSortTests.Double.testSort | SHUFFLE | 50000 | 844.7 | 233.5 | 3.6 |
| ArraysSortTests.Double.testSort | SHUFFLE | 300000 | 5764.4 | 1569.6 | 3.7 |
| ArraysSortTests.Double.testSort | SHUFFLE | 2000000 | 37797.5 | 12963.4 | 2.9 |
| ArraysSortTests.Float.testSort | RANDOM | 800 | 6.7 | 1.8 | 3.7 |
| ArraysSortTests.Float.testSort | RANDOM | 7000 | 239.2 | 18.8 | 12.7 |
| ArraysSortTests.Float.testSort | RANDOM | 50000 | 2204.3 | 177.2 | 12.4 |
| ArraysSortTests.Float.testSort | RANDOM | 300000 | 15123.9 | 1342.3 | 11.3 |
| ArraysSortTests.Float.testSort | RANDOM | 2000000 | 116263.9 | 10087.5 | 11.5 |
| ArraysSortTests.Float.testSort | REPEATED | 800 | 2.3 | 0.8 | 2.7 |
| ArraysSortTests.Float.testSort | REPEATED | 7000 | 28.2 | 6.6 | 4.3 |
| ArraysSortTests.Float.testSort | REPEATED | 50000 | 469.3 | 34.9 | 13.4 |
| ArraysSortTests.Float.testSort | REPEATED | 300000 | 2996.0 | 207.1 | 14.5 |
| ArraysSortTests.Float.testSort | REPEATED | 2000000 | 19391.7 | 1503.9 | 12.9 |
| ArraysSortTests.Float.testSort | STAGGER | 800 | 2.3 | 1.7 | 1.4 |
| ArraysSortTests.Float.testSort | STAGGER | 7000 | 20.0 | 19.0 | 1.1 |
| ArraysSortTests.Float.testSort | STAGGER | 50000 | 140.5 | 153.6 | 0.9 |
| ArraysSortTests.Float.testSort | STAGGER | 300000 | 831.9 | 1165.8 | 0.7 |
| ArraysSortTests.Float.testSort | STAGGER | 2000000 | 5591.0 | 8600.0 | 0.7 |
| ArraysSortTests.Float.testSort | SHUFFLE | 800 | 4.8 | 1.7 | 2.8 |
| ArraysSortTests.Float.testSort | SHUFFLE | 7000 | 85.1 | 18.5 | 4.6 |
| ArraysSortTests.Float.testSort | SHUFFLE | 50000 | 851.8 | 156.4 | 5.4 |
| ArraysSortTests.Float.testSort | SHUFFLE | 300000 | 5617.5 | 1204.3 | 4.7 |
| ArraysSortTests.Float.testSort | SHUFFLE | 2000000 | 37380.1 | 9040.1 | 4.1 |
| ArraysSortTests.Int.testSort | RANDOM | 800 | 6.3 | 1.3 | 4.9 |
| ArraysSortTests.Int.testSort | RANDOM | 7000 | 209.9 | 14.2 | 14.7 |
| ArraysSortTests.Int.testSort | RANDOM | 50000 | 2037.8 | 153.4 | 13.3 |
| ArraysSortTests.Int.testSort | RANDOM | 300000 | 14119.9 | 1139.2 | 12.4 |
| ArraysSortTests.Int.testSort | RANDOM | 2000000 | 111777.9 | 8509.0 | 13.1 |
| ArraysSortTests.Int.testSort | REPEATED | 800 | 1.6 | 0.5 | 3.3 |
| ArraysSortTests.Int.testSort | REPEATED | 7000 | 23.0 | 3.9 | 5.8 |
| ArraysSortTests.Int.testSort | REPEATED | 50000 | 311.2 | 20.8 | 15.0 |
| ArraysSortTests.Int.testSort | REPEATED | 300000 | 1961.6 | 138.8 | 14.1 |
| ArraysSortTests.Int.testSort | REPEATED | 2000000 | 11834.8 | 732.2 | 16.2 |
| ArraysSortTests.Int.testSort | STAGGER | 800 | 1.7 | 1.2 | 1.5 |
| ArraysSortTests.Int.testSort | STAGGER | 7000 | 15.9 | 14.2 | 1.1 |
| ArraysSortTests.Int.testSort | STAGGER | 50000 | 96.7 | 116.7 | 0.8 |
| ArraysSortTests.Int.testSort | STAGGER | 300000 | 591.1 | 949.2 | 0.6 |
| ArraysSortTests.Int.testSort | STAGGER | 2000000 | 4681.4 | 7195.2 | 0.7 |
| ArraysSortTests.Int.testSort | SHUFFLE | 800 | 4.3 | 1.2 | 3.5 |
| ArraysSortTests.Int.testSort | SHUFFLE | 7000 | 82.8 | 13.9 | 6.0 |
| ArraysSortTests.Int.testSort | SHUFFLE | 50000 | 769.5 | 138.3 | 5.6 |
| ArraysSortTests.Int.testSort | SHUFFLE | 300000 | 5076.7 | 1030.3 | 4.9 |
| ArraysSortTests.Int.testSort | SHUFFLE | 2000000 | 33627.7 | 7631.4 | 4.4 |
| ArraysSortTests.Long.testSort | RANDOM | 800 | 6.4 | 3.0 | 2.2 |
| ArraysSortTests.Long.testSort | RANDOM | 7000 | 204.9 | 33.2 | 6.2 |
| ArraysSortTests.Long.testSort | RANDOM | 50000 | 2061.8 | 281.7 | 7.3 |
| ArraysSortTests.Long.testSort | RANDOM | 300000 | 14055.6 | 1988.4 | 7.1 |
| ArraysSortTests.Long.testSort | RANDOM | 2000000 | 110750.1 | 15483.1 | 7.2 |
| ArraysSortTests.Long.testSort | REPEATED | 800 | 1.6 | 1.7 | 1.0 |
| ArraysSortTests.Long.testSort | REPEATED | 7000 | 19.8 | 35.1 | 0.6 |
| ArraysSortTests.Long.testSort | REPEATED | 50000 | 308.4 | 390.3 | 0.8 |
| ArraysSortTests.Long.testSort | REPEATED | 300000 | 1924.3 | 253.0 | 7.6 |
| ArraysSortTests.Long.testSort | REPEATED | 2000000 | 12113.7 | 1985.3 | 6.1 |
| ArraysSortTests.Long.testSort | STAGGER | 800 | 1.9 | 3.2 | 0.6 |
| ArraysSortTests.Long.testSort | STAGGER | 7000 | 18.2 | 32.8 | 0.6 |
| ArraysSortTests.Long.testSort | STAGGER | 50000 | 110.2 | 265.0 | 0.4 |
| ArraysSortTests.Long.testSort | STAGGER | 300000 | 851.0 | 1827.9 | 0.5 |
| ArraysSortTests.Long.testSort | STAGGER | 2000000 | 6601.0 | 14350.6 | 0.5 |
| ArraysSortTests.Long.testSort | SHUFFLE | 800 | 4.2 | 3.3 | 1.3 |
| ArraysSortTests.Long.testSort | SHUFFLE | 7000 | 83.0 | 32.6 | 2.5 |
| ArraysSortTests.Long.testSort | SHUFFLE | 50000 | 761.7 | 274.2 | 2.8 |
| ArraysSortTests.Long.testSort | SHUFFLE | 300000 | 5204.1 | 1999.7 | 2.6 |
| ArraysSortTests.Long.testSort | SHUFFLE | 2000000 | 33995.3 | 14655.9 | 2.3 |
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1681035789
More information about the hotspot-compiler-dev
mailing list