RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30]

Srinivas Vamsi Parasa duke at openjdk.org
Wed Sep 6 16:59:54 UTC 2023


On Wed, 30 Aug 2023 15:10:45 GMT, Srinivas Vamsi Parasa <duke at openjdk.org> wrote:

>>> Hi Vladimir, Just verified that the test/jdk/java/util/Arrays/Sorting.java is triggering the intrinsic without additional flags
>> 
>> Just to add that Sorting.java has short and long run modes. The default when running with jtreg or make run-test is the short run so that it doesn't take too long.  It might be useful to try it without -shortrun to give the intrinsic a better work out.
>
>> @AlanBateman If it helps, the changes made by @vamsi-parasa to DualPivotQuickSort.java don't change the logic as written in Java. They only carve out the functionality into separate Java methods retaining the meaning exactly as before. These Java methods are then optimized through a stub.
> 
> Hi Alan,
> 
> As Sandhya (@sviswa7) mentioned, this PR does not make any logic changes to the DualPivotQuickSort.java. 
> 
> The code was refactored to separate out the partitioning logic (dual pivot and single pivot) into separate methods. These methods are being intrinsified using AVX512 to do vectorized partitioning preserving the logic of Java side. Similarly, the methods to sort small arrays (insertionSort/mixedInsertionSort) are being intrinsified as well to use AVX512 sort while preserving the logic on Java side.
> 
> Could you please go through the changes and provide your comments and suggestions?
> 
> Thanks,
> Vamsi

> Hi Vamsi (@vamsi-parasa)!
> 
> Please see my few questions/suggestions related to method arraySort() and other code.
> 
> **1.** If size < MIN_FAST_SMALL_ARRAY_SORT_SIZE (=16), you don't go into intrinsic arraySort(), method mixedInsertionSort is invoked instead.
> 
> `if (size < MAX_MIXED_INSERTION_SORT_SIZE + bits && (bits & 1) > 0) {` ` int last = high - 3 * ((size >> 5) << 3);` ` if (size < MIN_FAST_SMALL_ARRAY_SORT_SIZE) mixedInsertionSort(a, low, last , high);` ` else arraySort(int.class, a, baseOffset, low, high, last);` ` return;` `}`
> 
> Is Java mixedInsertionSort actually faster than intrinsic arraySort() on small (< 16) arrays? What if you try to invoke arraySort() on all small arrays and don't use constant MIN_FAST_SMALL_ARRAY_SORT_SIZE at all? Like this:
> 
> `if (size < MAX_MIXED_INSERTION_SORT_SIZE + bits && (bits & 1) > 0) {` ` int last = high - 3 * ((size >> 5) << 3);` ` arraySort(int.class, a, baseOffset, low, high, last);` ` return;` `}`
> 
> Could you please check and benchmark it?
> 
> **2.** On the next step you apply the same approach when you invoke insertionSort() on the small leftmost part.
> 
> `if (size < MAX_INSERTION_SORT_SIZE) {` ` if (size < MIN_FAST_SMALL_ARRAY_SORT_SIZE) insertionSort(a, low, high);` ` else arraySort(int.class, a, baseOffset, low, high, -1);` ` return;` `}`
> 
> But this invocation happens only once, on the leftmost part only. I think we can invoke Java code and don't go to intrinsic arraySort(). I guess we will not have profit with intrinsic method here. See:
> 
> `if (size < MAX_INSERTION_SORT_SIZE) {` ` insertionSort(a, low, high);` ` return;` `}`
> 
> Could you please try this case without intrinsic and benchmark it?
> 
> **3.** You introduce constant int baseOffset = Unsafe.ARRAY_INT_BASE_OFFSET inside the loop. What if we pass the constant directly? For example:
> 
> `arraySort(int.class, a, Unsafe.ARRAY_INT_BASE_OFFSET, low, high, last);`
> 
> **3.A** The first parameter of arraySort() is elemType (int.class, long,class etc.) The second parameter base offset depends on this elem type. For example, if elemType is int.class, base offset will be Unsafe.ARRAY_INT_BASE_OFFSET. Can we eliminate base offset here and set up value inside intrinsic?
> 
> **4.** You define 'int[] pivotIndices;' at the beginning of the loop, but the indices will be used later (or not used at all). My proposal to declare pivotIndices in if-then-else, see:
> 
> `int[] pivotIndices = new int[] {e1, e5};`
> 
> **5.** Method arrayPartition has argument isDualPivot, I think we can avoid it. If isDualPivot is true, pivot indices are different. If indices are equal, it means single pivot partitioning. So, changes are:
> 
> `private static void arrayPartition(Class<?> elemType, Object array, long offset, int low, int high, int[] pivotIndices) {` ` if (pivotIndices[0] == pivotIndices[1]) partitionSinglePivot(array, low, high, pivotIndices);` ` else partitionDualPivot(array, low, high, pivotIndices);` `}`
> 
> **6.** Please add brackets for 'then' and 'else' statements according to common Java style:
> 
> `if (pivotIndices[0] == pivotIndices[1]) {` ` partitionSinglePivot(array, low, high, pivotIndices);` `} else {` ` partitionDualPivot(array, low, high, pivotIndices);` `};`
> 
> Thank you, Vladimir

Hello Vladimir (@iaroslavski)

Thank you for the suggestions!

Regarding suggestions (1) and (2): will soon provide the performance data when using small arrays (<= 16) with and without the intrinsic. 

Regarding the code refactoring suggestions in (3) to (6): I am currently working on code refactoring on both the Java and intrinsic side. Your suggestions will be incorporated in the refactoring. 

Will inform you after the new changes have been pushed to github.

Thanks,
Vamsi

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1708766999


More information about the build-dev mailing list