Re[4]: The new optimized version of Dual-Pivot Quicksort

Vladimir Yaroslavskiy vlv.spb.ru at mail.ru
Mon Nov 12 10:11:07 UTC 2018


Hi Laurent,


No information about JMH benchmarking.
I use Bentley's test suite to compare algorithms.

>Понедельник, 12 ноября 2018, 13:06 +03:00 от Laurent Bourgès <bourges.laurent at gmail.com>:
>
>Hi,
>
>Do you know if someone has written a complete JMH benchmark suite dedicated to Arrays.sort() ?
>with varying array size (trivial) but also testing lots of data distributions: (see Vladimir's tests) and possibly all variants (int, long, double, Object[] )
>
>It could be part of the standard OpenJDK JMH test suite...
>
>For now, I forked the nearly-optimal-mergesort repository on github:
>https://github.com/bourgesl/nearly-optimal-mergesort-code/tree/master/results
>
>Cheers,
>Laurent
>Le sam. 10 nov. 2018 à 12:49, Laurent Bourgès < bourges.laurent at gmail.com > a écrit :
>>Dear Vladimir & other Java sort experts,
>>
>>I made the port of the DPQS 2018.2 code last night to support a secondary array to be sorted and use preallocation (aux/run for merge sort):
>>https://github.com/bourgesl/nearly-optimal-mergesort-code/blob/master/src/wildinter/net/mergesort/DualPivotQuicksort2018Ext.java
>>
>>I succeded in using this variant in Marlin renderer (dev) and it is promising. I figured out a rendering bug in 1 test wigh 65535 random segments, certainly due to array swaps in mergesort (my side)...
>>
>>I will look again at my changes to check if I miss some permutation (a // b) or made any mistake on their indices... tricky.
>>
>>PS: Please share your updated webrev when possible to rebase my changes.
>>
>>Regards,
>>Laurent
>>Le ven. 9 nov. 2018 à 10:08, Laurent Bourgès < bourges.laurent at gmail.com > a écrit :
>>>Hi Vladimir,
>>>
>>>Thank you for your attention, you are the Sort Master.
>>>
>>>Le ven. 9 nov. 2018 à 09:02, Vladimir Yaroslavskiy < vlv.spb.ru at mail.ru > a écrit :
>>>>Hi Laurent,
>>>>
>>>>The new version is still under review, there were a lot of suggestions and ideas from Doug Lea.
>>>>I needed time to apply and check them. I'm going to send final version in few days.
>>>
>>>Excellent news, so it will ship in jdk12...
>>>>
>>>>As to new method sort(a[], low, high, indices): do you mean this method in Arrays class?
>>>>Are you going to extend Arrays API?
>>>
>>>I benchmarked my MergeSort and adding extra array implies many more moves: thresholds should be adjusted and my sort is sometimes better as it makes half moves (a <-> buffer swaps), so this complementary sort should have its own tuned implementation (tests & benchmarks).
>>>
>>>I coded my sort as such need did not match any Arrays.sort() methods.
>>>Having it publicly in jdk.base would be great for any other data handling algorithm (science, AI ?)
>>>
>>>If you agree it is useful, I could try proposing an Arrays/Collection API extension like :
>>>sort(<primitive or Value>[], low, high, int[]indices)
>>>
>>>>
>>>>About new parameters workbase[]: method sort(...) in DualPivotQuicksort class (version is under review)
>>>>has extra parameter - parallel context (Sorter sorter) which has required workbase[].
>>>>If we run sorting sequentially, sorter parameter is null (no any objects) and temporary buffer
>>>>will be created inside in tryMerge method if we go ahead with merging.
>>>
>>>I will have a look. For Marlin, parallel sort is not possible as rendering shapes can be parallelized in the application (geoserver map rendering).
>>>
>>>>
>>>>I will send new sources in few days and add you in cc, so you can look at it
>>>>and find if new methods are acceptable for you. Then we can discuss required changes (if any).
>>>
>>>Perfect, I started adapting your code and will share soon the link to my github repo.
>>>
>>>>
>>>>Thank you,
>>>>Vladimir
>>>
>>>Thank you very much for your first thoughts,
>>>Laurent
>>>
>>>>
>>>>>Пятница,  9 ноября 2018, 10:44 +03:00 от Laurent Bourgès < bourges.laurent at gmail.com >:
>>>>>
>>>>>Hi,
>>>>>
>>>>>I am currently testing many sort algorithms to improve the Marlin renderer (AA 2D shape rasterizer), I integrated since OpenJDK9 and am still improving for OpenJDK12 .
>>>>>
>>>>>I created my MergeSort (top down, check for sorted parts, array / buffer swap to minimize moves, isort on small sub arrays) that sort 2 int[]: crossing + edge indices.
>>>>> It is critical as edge crossings are sorted at every scanline, data are almost sorted/reversed, not really random.
>>>>>
>>>>>3 questions:
>>>>>- why is this patch dormant ? I checked in openjdk12 repo and your changes were not integrated.
>>>>>- I need a sort() method that works with 2 arrays: data + indices (pointer like). Such sorted indices are useful to use for lookups c[idx[k]] or to perform other array permutations...
>>>>>Would you agree adding such new sort(a[], low, high, indices)
>>>>>- Marlin uses a ThreadLocal context to avoid any allocation. JDK sort() impl do not accept parameters to give any buffer[] and avoid allocations. Would you agree adding such optional parameters (like workbase[]) ?
>>>>>
>>>>>I will experiment adapting the DualPivotQuickSort in Marlin renderer and perform array sort race & rendering benchmarks.
>>>>>
>>>>>Cheers,
>>>>>Laurent
>>>>>Le ven. 19 janv. 2018 à 14:38, Vladimir Yaroslavskiy < vlv.spb.ru at mail.ru > a écrit :
>>>>>>Hi team,
>>>>>>
>>>>>>In Sept 2009 Josh Bloch, Jon Bentley and I introduced new sorting
>>>>>>algorithm, Dual-Pivot Quicksort, for primitives in JDK 7 and later
>>>>>>I suggested several improvements of Dual-Pivot Quicksort, which
>>>>>>were integrated into JDK 8.
>>>>>>
>>>>>>Now I have more optimized and faster version of Dual-Pivot Quicksort.
>>>>>>I have been working on it for the last 5 years. Please, find the
>>>>>>summary of changes below and sources / diff at webrev [1].
>>>>>>
>>>>>>All tests and benchmarking were run on the most recent build of JDK 10,
>>>>>>jdk-10-ea+39. The new version shows the better performance on different
>>>>>>inputs and guarantees n*log(n) on any data.
>>>>>>
>>>>>>The new implementation of Dual-Pivot Quicksort is 'all-in-one' version:
>>>>>>it contains one code for both parallel and sequential sorting algorithms.
>>>>>>
>>>>>>Suggested version is 10-20% faster on random data, 1.5-4 times faster
>>>>>>on nearly structured arrays, 1.5-2 times faster on period inputs.
>>>>>>
>>>>>>Parallel Dual-Pivot Quicksort is 1.5-3 times faster than current
>>>>>>algorithm based on merge sort.
>>>>>>
>>>>>>Benchmarking on the test suite, suggested by Jon Bentley, shows the
>>>>>>boost of performance in 1.4 times. This test suite contains several
>>>>>>types of inputs, such as random data, nearly structured arrays, data
>>>>>>with period and so on. Also there are several modifications of inputs
>>>>>>and parameters which are used in data building. We run sorting on all
>>>>>>combinations to compare two algorithms.
>>>>>>
>>>>>>Please let me know if you have any questions / comments.
>>>>>>
>>>>>>Summary of changes:
>>>>>>
>>>>>>DualPivotQuicksort class
>>>>>>------------------------
>>>>>>* Pivots are chosen with another step, the 1-st and 5-th candidates
>>>>>>   are taken as pivots instead of 2-nd and 4-th.
>>>>>>* Splitting into parts is related to the golden ratio
>>>>>>* Pivot candidates are sorted by combination of 5-element
>>>>>>   network sorting + insertion sort
>>>>>>* New backwards partitioning is simpler and more efficient
>>>>>>* Quicksort tuning parameters were updated
>>>>>>* Merging sort is invoked on each iteration from Quicksort
>>>>>>* Additional steps for float/double were fully rewritten
>>>>>>* Heap sort is invoked on the leftmost part
>>>>>>* Heap sort is used as a guard against quadratic time
>>>>>>* Parallel sorting is based on Dual-Pivot Quicksort
>>>>>>   instead of merge sort
>>>>>>
>>>>>>SortingAlgorithms class
>>>>>>-----------------------
>>>>>>* Merging sort and pair insertion sort were moved from
>>>>>>   DualPivotQuicksort class
>>>>>>* Pair insertion sort was simplified and optimized
>>>>>>* New nano insertion sort was introduced for tiny arrays
>>>>>>* Merging sort was fully rewritten
>>>>>>* Optimized merging partitioning is used
>>>>>>* Merging parameters were updated
>>>>>>* Merging of runs was fully rewritten
>>>>>>* Fast version of heap sort was introduced
>>>>>>* Parallel merging sort was also provided
>>>>>>
>>>>>>Sorting / ParallelSorting classes
>>>>>>---------------------------------
>>>>>>* New test cases were added
>>>>>>* Sources of these classes were unified
>>>>>>
>>>>>>Arrays class
>>>>>>------------
>>>>>>* Calls of Dual-Pivot Quicksort were updated
>>>>>>* Parallel sorting of primitives was switched to parallel
>>>>>>   Dual-Pivot Quicksort
>>>>>>* Javadoc was modified
>>>>>>
>>>>>>ArraysParallelSortHelpers class
>>>>>>-------------------------------
>>>>>>* Old implementation of parallel sorting for primitives was removed
>>>>>>* Javadoc was updated
>>>>>>
>>>>>>Thank you,
>>>>>>Vladimir
>>>>>>
>>>>>>--------------------------------------------------------------------
>>>>>>[1]  http://cr.openjdk.java.net/~alanb/DualPivotSortUpdate/webrev.01/
>>>>>>--------------------------------------------------------------------
>>>>
>>>>
>>>>-- 
>>>>Vladimir Yaroslavskiy


-- 
Vladimir Yaroslavskiy


More information about the core-libs-dev mailing list