RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]
Srinivas Vamsi Parasa
duke at openjdk.org
Mon Oct 9 18:55:43 UTC 2023
On Thu, 5 Oct 2023 23:36:48 GMT, Srinivas Vamsi Parasa <duke at openjdk.org> wrote:
>> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>>
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below.
>>
>>
>> **Arrays.sort performance data using JMH benchmarks for arrays with random data**
>>
>> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup |
>> | --- | --- | --- | --- | --- |
>> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 |
>> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 |
>> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 |
>> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 |
>> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 |
>> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 |
>> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** |
>> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** |
>> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** |
>> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 |
>> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 |
>> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 |
>> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 |
>> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 |
>> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 |
>> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** |
>> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** |
>> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** |
>> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 |
>> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 |
>> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 |
>> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 |
>> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 |
>> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 |
>> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** |
>> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** |
>> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** |
>> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 |
>> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 |
>> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 |
>> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 |
>> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 |
>> | ArraysSort.longSort | 1000 | 10.449 | 6....
>
> Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits:
>
> - fix code style and formatting
> - Merge branch 'master' of https://git.openjdk.java.net/jdk into avx512sort
> - Update CompileThresholdScaling only for the sort and partition intrinsics; update build script to remove nested if
> - change variable names of indexPivot* to pivotIndex*
> - Update DualPivotQuicksort.java
> - Rename arraySort and arrayPartition Java methods to sort and partition. Cleanup some comments
> - Remove the unnecessary exception in single pivot partitioning fallback method
> - Move functional interfaces close to the associated methods
> - Refactor the sort and partition intrinsics to accept method references for fallback functions
> - Refactor stub handling to use a generic function for all types
> - ... and 35 more: https://git.openjdk.org/jdk/compare/a1c9587c...a5262d86
> _Mailing list message from [Florian Weimer](mailto:fw at deneb.enyo.de) on [hotspot-runtime-dev](mailto:hotspot-runtime-dev at mail.openjdk.org):_
>
> I believe this has introduced a build failure with GCC 12.2 on Debian 12.1:
>
> Building target 'jdk' in configuration '/home/fw/build/jdk' In file included from /usr/lib/gcc/x86_64-linux-gnu/12/include/immintrin.h:49, from ?/jdk/src/java.base/linux/native/libsimdsort/avx512-common-qsort.h:60, from ?/jdk/src/java.base/linux/native/libsimdsort/avx512-32bit-qsort.hpp:31, from ?/jdk/src/java.base/linux/native/libsimdsort/avx512-linux-qsort.cpp:26: In function '__m512i _mm512_shuffle_epi32(__m512i, _MM_PERM_ENUM)', inlined from 'static zmm_vector<int>::zmm_t zmm_vector<int>::shuffle(zmm_t) [with unsigned char mask = 177]' at ?/jdk/src/java.base/linux/native/libsimdsort/avx512-32bit-qsort.hpp:96:36, inlined from 'zmm_t sort_zmm_32bit(zmm_t) [with vtype = zmm_vector<int>; zmm_t = __vector(8) long long int]' at ?/jdk/src/java.base/linux/native/libsimdsort/avx512-32bit-qsort.hpp:170:27: /usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h:4459:50: error: '__Y' is used uninitialized [-Werror=uninitialized] 4459 | return (__m512i) __builtin_ia32_pshufd512_mask ((__v
16si) __A, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~ 4460 | __mask, | ~~~~~~~ 4461 | (__v16si) | ~~~~~~~~~ 4462 | _mm512_undefined_epi32 (), | ~~~~~~~~~~~~~~~~~~~~~~~~~~ 4463 | (__mmask16) -1); | ~~~~~~~~~~~~~~~ /usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h: In function 'zmm_t sort_zmm_32bit(zmm_t) [with vtype = zmm_vector<int>; zmm_t = __vector(8) long long int]': /usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h:206:11: note: '__Y' was declared here 206 | __m512i __Y = __Y; | ^~~
Hello Florian (@fweimer),
As Paul mentioned, this is an issue in GCC 12.
Will reproduce it on my side and issue a workaround [PR](https://bugs.openjdk.org/browse/JDK-8317741) shortly to address this issue.
Thanks,
Vamsi
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1753490421
More information about the hotspot-compiler-dev
mailing list