Replacement of Quicksort in java.util.Arrays with new Dual-Pivot Quicksort

Fri Sep 11 13:32:08 UTC 2009

just my two cents :
In the loop tagged "sorting" and "equals element",
a[k] can be stored in a local variable to avoid to recompute it several 
time.

The algorithm use two constants: 37 and 13,
I think they should be declared as private static final.

Rémi

Le 11/09/2009 12:35, Vladimir Yaroslavskiy a écrit :
> Hello All,
>
> I'd like to share with you new Dual-Pivot Quicksort which is
> faster than the known implementations (theoretically and
> experimental). I'd like to propose to replace the JDK's
> Quicksort implementation by new one.
>
> Description
> -----------
> The classical Quicksort algorithm uses the following scheme:
>
> 1. Pick an element P, called a pivot, from the array.
> 2. Reorder the array so that all elements, which are less than
>    the pivot, come before the pivot and all elements greater than
>    the pivot come after it (equal values can go either way). After
>    this partitioning, the pivot element is in its final position.
> 3. Recursively sort the sub-array of lesser elements and the
>    sub-array of greater elements.
>
> The invariant of classical Quicksort is:
>
> [ <= p | >= p ]
>
> There are several modifications of the schema:
>
> [ < p | = p | > p ]  or  [ = p | < p | > p | = p ]
>
> But all of them use *one* pivot.
>
> The new Dual-Pivot Quicksort uses *two* pivots elements in this manner:
>
> 1. Pick an elements P1, P2, called pivots from the array.
> 2. Assume that P1 <= P2, otherwise swap it.
> 3. Reorder the array into three parts: those less than the smaller
>    pivot, those larger than the larger pivot, and in between are
>    those elements between (or equal to) the two pivots.
> 4. Recursively sort the sub-arrays.
>
> The invariant of the Dual-Pivot Quicksort is:
>
> [ < P1 | P1 <= & <= P2 } > P2 ]
>
> The new Quicksort is faster than current implementation of Quicksort
> in JDK (L. Bentley and M. Douglas McIlroy) and classical Quicksort.
>
> The full description of the Dual-Pivot Quicksort you can find
> on my page: http://iaroslavski.narod.ru/quicksort
>
> Performance tests
> -----------------
> Here is result of running on different types of input data:
>
> Client VM                          all    85%  organ  0..1          0..4
>              random ascend descend equal  equal pipes random 010101 
> random
> Dual-Pivot:  16.83  5.31    5.47   0.35  0.68  10.59  1.06   1.02   2.18
>   Bentley's:  19.77  9.08   10.13   0.63  1.12  13.22  1.63   1.08   2.49
>
> Server VM                          all    85%  organ  0..1          0..4
>              random ascend descend equal  equal pipes random 010101 
> random
> Dual-Pivot:  23.94  6.68    6.63   0.43  0.62  17.14  1.42   1.96   3.41
>   Bentley's:  25.20 10.18   10.32   2.07  1.33  16.72  2.95   1.82   3.39
>
> The a lot of other tests have been run under client and server mode.
> The most interesting is BentleyBasher test framework. It runs battery
> of tests for all cases:
>
> { 100, 1000, 10000, 1000000 } x
> { sawtooth, rand, stagger, plateau, shuffle } x
> { ident, reverse, reverse_front, reverse_back, sort, dither}
>
> where
>
> 100, ... , 1000000 - array length
>
> sawtooth: x[i] =i%m
> rand: x[i] = rand() % m
> stagger: x[i] = (i*m + i) % n
> plateau: x[i] = min(i, m)
> shuffle: x[i] = rand()%m? (j+=2): (k+=2)
>
> ident(x) - a copy of x
> reverse(x, 0, n) - reversed copy
> reverse_front(x, 0, n/2) - front half reversed
> reverse_back(x, n/2, n) - back half reversed
> sort(x) - an ordered copy
> dither(x) - add i%5 to x[i]
>
> Here is the result of execution:
> Server VM: 
> http://spreadsheets.google.com/pub?key=t_EAWUkQ4mD3BIbOv8Fa-AQ&output=html 
>
> Client VM: 
> http://spreadsheets.google.com/pub?key=tdiMo8xleTxd23nKUObcz0Q&single=true&gid=0&output=html 
>
>
> Mathematical investigations
> ---------------------------
> It is proved that for the Dual-Pivot Quicksort the average number of
> comparisons is 2*n*ln(n), the average number of swaps is 0.8*n*ln(n),
> whereas classical Quicksort algorithm has 2*n*ln(n) and 1*n*ln(n)
> respectively. Full mathematical proof see in attached proof.txt
> and proof_add.txt files. Theoretical results are also confirmed
> by experimental counting of the operations.
>
> Diff between current and new implementation of Quicksort
> --------------------------------------------------------
>
> Here is the link to the diff for java.util.Arrays class:
> http://cr.openjdk.java.net/~alanb/6880672/webrev.00
>
> If you like to look and play with new algorithm,
> please, take attached class DualPivotQuicksort.java
>
> Feedback
> --------
>
> Also I'd like to share a feedback from Joshua Bloch and
> Jon Bentley who spent a lot of time investigating this
> algorithm, who gave me many advices and tips how to
> make new Quicksort better.
>
> -------- Original Message --------
> Subject: Re: Integration of new Dual-Pivot Quicksort into JDK 7
> Date: Thu, 10 Sep 2009 07:20:11 -0700
> From: Joshua Bloch <jjb at google.com>
>
> Jon also says that Vladimir should make every reasonable improvement to
> the basic method before checking in the code. In his words, "It would be
> horrible to put the new code into the library, and then have someone
> else come along and speed it up by another 20% by using standard
> techniques." I believe it's not unlikely that this code may end up
> getting ported to many languages and widely deployed in much the manner
> of Bentley and McIlroy's fine sort (which is nearing 20 successful years
> in the field). Jon will help Vladimir do this.
>
> -------- Original Message --------
> Subject: Dual-Pivot Quicksort: Next Steps
> Date: Wed, 09 Sep 2009 15:02:25 -0400
> From: Jon Bentley <jbentley at avaya.com>
>
> Vladimir, Josh,
>      I *finally* feel like I understand what is going on.  Now that I
> (think that) I see it, it seems straightforward and obvious.
>      Tony Hoare developed Quicksort in the early 1960s. I was very
> proud to make minor contributions to a particularly clean (binary)
> quicksort in the mid 1980s, to a relatively straightforward, industrial
> strength Quicksort with McIlroy in the early 1990s, and then to
> algorithms and data structures for strings with Sedgewick in the mid 
> 1990s.
>      I think that Vladimir's contributions to Quicksort go way beyond
> anything that I've ever done, and rank up there with Hoare's original
> design and Sedgewick's analysis. I feel so privileged to play a very,
> very minor role in helping Vladimir with the most excellent work!
>
> -----------------------------------------------
>
> Let me know, if you have any questions/comments.
>
> Thank you,
> Vladimir