Replacement of Quicksort in java.util.Arrays with new Dual-Pivot Quicksort

Sat Sep 12 12:13:17 UTC 2009

Sorry :( . Please ignore previous post. Warming-up yield to better results
in dual-pivot's favor.

On Sat, Sep 12, 2009 at 2:43 PM, Goktug Gokdogan <gokdogan at gmail.com> wrote:

> My absolutely non-scientific preliminary tests indicate Arrays.sort
> performs better with the same input (5000 items) in nearly all runs
> (-client).
>  public static void main(String[] args) {
> final int n = 5000;
>  int[] array = new int[n];
> int[] array2 = new int[n];
> Random random = new Random();
>  for (int i = 0; i < n; i++) {
> int r = random.nextInt(n);
> array[i] = r;
>  array2[i] = r;
> }
>
> long st1 = System.nanoTime();
>  Arrays.sort(array2);
> long st2 = System.nanoTime();
> DualPivotQuicksort.sort(array);
>  long end = System.nanoTime();
> System.out.println(st2 - st1);
> System.out.println(end - st2);
>  }
>
> I tried changing which sort runs first, but the results did not change.
> Over 100.000 items, dual-pivot consistently beats Arrays.sort.
>
> Well, this test is not a good reference though :)
>
> On Fri, Sep 11, 2009 at 5:27 PM, Vladimir Iaroslavski <iaroslavski at mail.ru
> > wrote:
>
>> Hi,
>>
>> I've tried to use local variable int ak = a[k] in the loop
>> and not found saving of time. There are 3 occurrences of a[k]
>> in each loop, but only two can be changed because third
>> comes after line: swap(a, k, great--);
>>
>> As summary: I'm changing only hardcoded constant by
>> private static final variables.
>>
>> Thank you,
>> Vladimir
>>
>>
>> Ulf Zibis wrote:
>>
>>> Am 11.09.2009 15:32, Rémi Forax schrieb:
>>>
>>>> just my two cents :
>>>> In the loop tagged "sorting" and "equals element",
>>>> a[k] can be stored in a local variable to avoid to recompute it several
>>>> time.
>>>>
>>>
>>> I add 2 cents more:
>>> Doesn't HotSpot see this, and optimize accordingly?
>>> IMO: It's time that javac should do such optimization, as there are less
>>> optimized VM's in the world.
>>>
>>> -Ulf
>>>
>>>
>>>
>>>> The algorithm use two constants: 37 and 13,
>>>> I think they should be declared as private static final.
>>>>
>>>> Rémi
>>>>
>>>>
>>>> Le 11/09/2009 12:35, Vladimir Yaroslavskiy a écrit :
>>>>
>>>>> Hello All,
>>>>>
>>>>> I'd like to share with you new Dual-Pivot Quicksort which is
>>>>> faster than the known implementations (theoretically and
>>>>> experimental). I'd like to propose to replace the JDK's
>>>>> Quicksort implementation by new one.
>>>>>
>>>>> Description
>>>>> -----------
>>>>> The classical Quicksort algorithm uses the following scheme:
>>>>>
>>>>> 1. Pick an element P, called a pivot, from the array.
>>>>> 2. Reorder the array so that all elements, which are less than
>>>>>   the pivot, come before the pivot and all elements greater than
>>>>>   the pivot come after it (equal values can go either way). After
>>>>>   this partitioning, the pivot element is in its final position.
>>>>> 3. Recursively sort the sub-array of lesser elements and the
>>>>>   sub-array of greater elements.
>>>>>
>>>>> The invariant of classical Quicksort is:
>>>>>
>>>>> [ <= p | >= p ]
>>>>>
>>>>> There are several modifications of the schema:
>>>>>
>>>>> [ < p | = p | > p ]  or  [ = p | < p | > p | = p ]
>>>>>
>>>>> But all of them use *one* pivot.
>>>>>
>>>>> The new Dual-Pivot Quicksort uses *two* pivots elements in this manner:
>>>>>
>>>>> 1. Pick an elements P1, P2, called pivots from the array.
>>>>> 2. Assume that P1 <= P2, otherwise swap it.
>>>>> 3. Reorder the array into three parts: those less than the smaller
>>>>>   pivot, those larger than the larger pivot, and in between are
>>>>>   those elements between (or equal to) the two pivots.
>>>>> 4. Recursively sort the sub-arrays.
>>>>>
>>>>> The invariant of the Dual-Pivot Quicksort is:
>>>>>
>>>>> [ < P1 | P1 <= & <= P2 } > P2 ]
>>>>>
>>>>> The new Quicksort is faster than current implementation of Quicksort
>>>>> in JDK (L. Bentley and M. Douglas McIlroy) and classical Quicksort.
>>>>>
>>>>> The full description of the Dual-Pivot Quicksort you can find
>>>>> on my page: http://iaroslavski.narod.ru/quicksort
>>>>>
>>>>> Performance tests
>>>>> -----------------
>>>>> Here is result of running on different types of input data:
>>>>>
>>>>> Client VM                          all    85%  organ  0..1
>>>>>  0..4
>>>>>             random ascend descend equal  equal pipes random 010101
>>>>> random
>>>>> Dual-Pivot:  16.83  5.31    5.47   0.35  0.68  10.59  1.06   1.02
>>>>> 2.18
>>>>>  Bentley's:  19.77  9.08   10.13   0.63  1.12  13.22  1.63   1.08
>>>>> 2.49
>>>>>
>>>>> Server VM                          all    85%  organ  0..1
>>>>>  0..4
>>>>>             random ascend descend equal  equal pipes random 010101
>>>>> random
>>>>> Dual-Pivot:  23.94  6.68    6.63   0.43  0.62  17.14  1.42   1.96
>>>>> 3.41
>>>>>  Bentley's:  25.20 10.18   10.32   2.07  1.33  16.72  2.95   1.82
>>>>> 3.39
>>>>>
>>>>> The a lot of other tests have been run under client and server mode.
>>>>> The most interesting is BentleyBasher test framework. It runs battery
>>>>> of tests for all cases:
>>>>>
>>>>> { 100, 1000, 10000, 1000000 } x
>>>>> { sawtooth, rand, stagger, plateau, shuffle } x
>>>>> { ident, reverse, reverse_front, reverse_back, sort, dither}
>>>>>
>>>>> where
>>>>>
>>>>> 100, ... , 1000000 - array length
>>>>>
>>>>> sawtooth: x[i] =i%m
>>>>> rand: x[i] = rand() % m
>>>>> stagger: x[i] = (i*m + i) % n
>>>>> plateau: x[i] = min(i, m)
>>>>> shuffle: x[i] = rand()%m? (j+=2): (k+=2)
>>>>>
>>>>> ident(x) - a copy of x
>>>>> reverse(x, 0, n) - reversed copy
>>>>> reverse_front(x, 0, n/2) - front half reversed
>>>>> reverse_back(x, n/2, n) - back half reversed
>>>>> sort(x) - an ordered copy
>>>>> dither(x) - add i%5 to x[i]
>>>>>
>>>>> Here is the result of execution:
>>>>> Server VM:
>>>>> http://spreadsheets.google.com/pub?key=t_EAWUkQ4mD3BIbOv8Fa-AQ&output=html
>>>>> Client VM:
>>>>> http://spreadsheets.google.com/pub?key=tdiMo8xleTxd23nKUObcz0Q&single=true&gid=0&output=html
>>>>>
>>>>> Mathematical investigations
>>>>> ---------------------------
>>>>> It is proved that for the Dual-Pivot Quicksort the average number of
>>>>> comparisons is 2*n*ln(n), the average number of swaps is 0.8*n*ln(n),
>>>>> whereas classical Quicksort algorithm has 2*n*ln(n) and 1*n*ln(n)
>>>>> respectively. Full mathematical proof see in attached proof.txt
>>>>> and proof_add.txt files. Theoretical results are also confirmed
>>>>> by experimental counting of the operations.
>>>>>
>>>>> Diff between current and new implementation of Quicksort
>>>>> --------------------------------------------------------
>>>>>
>>>>> Here is the link to the diff for java.util.Arrays class:
>>>>> http://cr.openjdk.java.net/~alanb/6880672/webrev.00
>>>>>
>>>>> If you like to look and play with new algorithm,
>>>>> please, take attached class DualPivotQuicksort.java
>>>>>
>>>>> Feedback
>>>>> --------
>>>>>
>>>>> Also I'd like to share a feedback from Joshua Bloch and
>>>>> Jon Bentley who spent a lot of time investigating this
>>>>> algorithm, who gave me many advices and tips how to
>>>>> make new Quicksort better.
>>>>>
>>>>> -------- Original Message --------
>>>>> Subject: Re: Integration of new Dual-Pivot Quicksort into JDK 7
>>>>> Date: Thu, 10 Sep 2009 07:20:11 -0700
>>>>> From: Joshua Bloch <jjb at google.com>
>>>>>
>>>>> Jon also says that Vladimir should make every reasonable improvement to
>>>>> the basic method before checking in the code. In his words, "It would
>>>>> be
>>>>> horrible to put the new code into the library, and then have someone
>>>>> else come along and speed it up by another 20% by using standard
>>>>> techniques." I believe it's not unlikely that this code may end up
>>>>> getting ported to many languages and widely deployed in much the manner
>>>>> of Bentley and McIlroy's fine sort (which is nearing 20 successful
>>>>> years
>>>>> in the field). Jon will help Vladimir do this.
>>>>>
>>>>> -------- Original Message --------
>>>>> Subject: Dual-Pivot Quicksort: Next Steps
>>>>> Date: Wed, 09 Sep 2009 15:02:25 -0400
>>>>> From: Jon Bentley <jbentley at avaya.com>
>>>>>
>>>>> Vladimir, Josh,
>>>>>     I *finally* feel like I understand what is going on.  Now that I
>>>>> (think that) I see it, it seems straightforward and obvious.
>>>>>     Tony Hoare developed Quicksort in the early 1960s. I was very
>>>>> proud to make minor contributions to a particularly clean (binary)
>>>>> quicksort in the mid 1980s, to a relatively straightforward, industrial
>>>>> strength Quicksort with McIlroy in the early 1990s, and then to
>>>>> algorithms and data structures for strings with Sedgewick in the mid
>>>>> 1990s.
>>>>>     I think that Vladimir's contributions to Quicksort go way beyond
>>>>> anything that I've ever done, and rank up there with Hoare's original
>>>>> design and Sedgewick's analysis. I feel so privileged to play a very,
>>>>> very minor role in helping Vladimir with the most excellent work!
>>>>>
>>>>> -----------------------------------------------
>>>>>
>>>>> Let me know, if you have any questions/comments.
>>>>>
>>>>> Thank you,
>>>>> Vladimir
>>>>>
>>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20090912/280f7a30/attachment.html>