Replacement of Quicksort in java.util.Arrays with new Dual-Pivot Quicksort
Neal Gafter
neal at gafter.com
Fri Sep 11 23:08:49 UTC 2009
Vadim-
It would be very interesting if something along these lines could be made
practical. It isn't obvious how to do step (3) in place. Either you end up
allocating extra storage to do it, in which case you might as well have used
a merge sort, or you end up doing some extra shuffling around of the data,
in which case it is probably more expensive than the dual-partition
version. Perhaps there is some technique I'm not thinking of.
Cheers,
Neal
On Fri, Sep 11, 2009 at 3:48 PM, Vadim Ponomarenko <vadim123 at gmail.com>wrote:
> Vladimir Yaroslavskiy <Vladimir.Yaroslavskiy at ...> writes:
> > I'd like to share with you new Dual-Pivot Quicksort which is
> > faster than the known implementations (theoretically and
> > experimental). I'd like to propose to replace the JDK's
> > Quicksort implementation by new one.
>
> This is a great idea; as a mathematician I immediately want to generalize
> it.
> Unfortunately, I lack the computing skill to test my generalization in
> practice.
>
> Suppose that instead of 2 pivots we have m (randomly chosen) pivots, where
> m=sqrt(n+1)-1.
> Step 1: Sort the m pivots, recursively.
> Step 2: Build a balanced binary tree from the pivots.
> Step 3: Sort the remaining n-m elements into (m+1) categories, using the
> BBT.
> Step 4: Sort each of the (m+1) categories, recursively.
>
> Let S(n) denote the number of steps needed for this algorithm. Calculating
> roughly:
> Step 1 requires S(m) steps.
> Step 2 requires m steps (it is easy since the pivots are already sorted).
> Step 3 requires (n-m)(log m) comparisons.
> Step 4 requires, on average, (m+1)S((n-m)/(m+1)) steps.
>
> m was chosen so that (n-m)/(m+1)=m, so S(n)=(m+2)S(m)+m+(n-m)(log m) <=
> (m+2)
> S(m)+ n log m. Crudely, we replace both m and m+2 by sqrt(n), to get as
> approximation:
> S(n) <= 0.5 n log n + n^0.5 [ 0.25 n^0.5 log n + n^0.25 [ 0.125 n^0.25 log
> n
> +...]]] = 0.5 n log n [1 + 0.5 + 0.25 + ...] = n log n. This appears to
> be
> better than 2 n log n. This argument can be made precise (the only really
> questionable issue is replacing m+2 by m) by first getting a crude but
> correct upper bound for 2S(m), which gets absorbed into the -mlogm term.
>
> Unfortunately, I'm not up to the detailed comparison vs. swap analysis as
> done by Mr. Yaroslavskiy, but it seems to me that, if nothing else, having
> an efficient algorithm to partition the non-pivot elements should provide a
> savings.
>
> Vadim Ponomarenko
> Dept. of Mathematics and Statistics
> San Diego State University
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20090911/86766558/attachment.html>
More information about the core-libs-dev
mailing list