Question on sorting
Vladimir Iaroslavski
iaroslavski at mail.ru
Wed Aug 4 13:43:23 UTC 2010
Hello Dmytro,
Thank you for investigation, the results are interesting.
I prepared simpler test [1], which has no recursion, it just
sorts 5 candidates and compare all elements with first (or second)
candidate. I run on array of 2000000 elements and result is:
e1/e2 267/460 on random data
e1/e2 193/200 on sorted array
and interesting fact: if I change comparison from "a[i] < pivot"
to "a[i] == pivot", it shows 189 for e1 and e2, for random and
sorted data.
I tried new schema [1/2, 1/4, 1/4], but it shows almost the same
time as [1/3, 1/3, 1/3], no improvements.
Thank you,
Vladimir
[1] ---------------------------------------------------------------
public static void sort(int[] a) {
int step = a.length >>> 3;
int e1 = step;
int e2 = e1 + step;
int e3 = e2 + step;
int e4 = e3 + step;
int e5 = e4 + step;
if (a[e1] > a[e2]) { int t = a[e1]; a[e1] = a[e2]; a[e2] = t; }
if (a[e4] > a[e5]) { int t = a[e4]; a[e4] = a[e5]; a[e5] = t; }
if (a[e1] > a[e3]) { int t = a[e1]; a[e1] = a[e3]; a[e3] = t; }
if (a[e2] > a[e3]) { int t = a[e2]; a[e2] = a[e3]; a[e3] = t; }
if (a[e1] > a[e4]) { int t = a[e1]; a[e1] = a[e4]; a[e4] = t; }
if (a[e3] > a[e4]) { int t = a[e3]; a[e3] = a[e4]; a[e4] = t; }
if (a[e2] > a[e5]) { int t = a[e2]; a[e2] = a[e5]; a[e5] = t; }
if (a[e2] > a[e3]) { int t = a[e2]; a[e2] = a[e3]; a[e3] = t; }
if (a[e4] > a[e5]) { int t = a[e4]; a[e4] = a[e5]; a[e5] = t; }
int pivot = a[e1]; // or a[e2]
for (int i = 0; i < a.length; i++) {
if (a[i] < pivot);
}
}
[/1] ---------------------------------------------------------------
Dmytro Sheyko wrote:
> Hi Vladimir,
>
> There could be many reasons for this.
> The verisimilar ones are imprecise time measurement with highly
> dispersed results and biased samples.
>
> Another reason is that an attempt to divide whole array into equal-size
> partition not always gives us the best result. And hence choosing
> "wrong" pivots could make partitions balanced slightly better.
>
> Let me clarify this counter-intuitive statement regarding not-equal
> partitioning.
>
> Consider following quite straightforward dual pivot quicksort.
>
> sort(a[]) {
> pivot1, pivot2 = choosePivots(a);
>
> // partitioning
> forall (a[k] in a) {
> if (a[k] < pivot1) {
> a[k] goes to the left partition
> } else if (a[k] > pivot2) {
> a[k] goes to the right partition
> } else {
> a[k] goes to the middle partition
> }
> }
>
> sort(left partition);
> sort(middle partition);
> sort(right partition);
> }
>
> Here you can see that during partitioning every item is compared with
> one or two pivots. In our case item is compared with
> the second pivot only if it greater than the first one. So what is the
> average number of comparison during partitioning?
> If we succeed to choose pivots so that they divide whole array into 3
> equal partitions we have 1*(1/3) + 2*(1/3) + 2*(1/3) = 5/3 per item.
> Is this ideal? What if pivots divide array in following proportions 1/2
> 1/4 1/4? Then we have 1*(1/2) + 2*(1/4) + 2*(1/4) = 3/2.
> 3/2 is less than 5/3.
>
> Let's find now ideal proportions of partitions taking into account
> number of comparison of sorting in whole.
>
> Assume that number of comparison is A*n*ln(n) + B*n + o(n) and we divide
> whole array in following proportions
> [alpha, (1 - alpha)/2, (1 - alpha)/2], where 0 < alpha < 1.
>
> A*n*ln(n) + B*n =
> = n * (alpha + 2*(1 - alpha)) { partitioning }
> + A * alpha*n * ln(alpha*n) + B * alpha*n { sorting left partition }
> + 2 * A * (1-alpha)*n/2 * ln((1-alpha)*n/2) + 2 * B * (1-alpha)*n/2 {
> sorting middle and right partitions }
>
> A*n*ln(n) + B*n =
> = A*alpha*n*ln(n) + A*(1-alpha)*n*ln(n) +
> + B*alpha*n + B*(1-alpha)*n +
> + n * (alpha + 2*(1-alpha))
> + A*alpha*n*ln(alpha) + A*(1-alpha)*n*ln((1-alpha)/2)
>
> 0 = (alpha + 2*(1-alpha)) + A*alpha*ln(alpha) + A*(1-alpha)*ln((1-alpha)/2)
>
> A = (alpha - 2) / (alpha*ln(alpha) + (1-alpha)*ln((1-alpha)/2))
>
> alpha A
> 1/12 2.078316236
> 2/12 1.783079278
> 3/12 1.617083005
> 4/12 1.517065378
> 5/12 1.461274369
> 6/12 1.442695041 !!!
> 7/12 1.463491681
> 8/12 1.536871653
> 9/12 1.699242413
> 10/12 2.060936332
> 11/12 3.143757518
>
> It appears that the best alpha is about 1/2. Thus ideal partition is
> something like [1/2, 1/4, 1/4].
>
> Of course, these consideration does not apply to the real DPQ
> completely. This is because in real DPQ every item is not compared with
> pivots in well defined order and
> real DPQ contains numerous special cases, which make it harder to analyze.
>
> Regards,
> Dmytro Sheyko
>
> > From: iaroslavski at mail.ru
> > To: core-libs-dev at openjdk.java.net
> > Subject: Question on sorting
> > Date: Fri, 30 Jul 2010 22:55:00 +0400
> > CC: iaroslavski at mail.ru
> >
> > Hello,
> >
> > I have performance question in sorting.
> >
> > In an implementation of Dual-Pivot Quicksort 5 elements
> >
> > a[e1], a[e2], a[e3], a[e4], a[e5]
> >
> > where e1 = len/6, e2 = len/3, e3 = len/2, e4 = len*2/3, e5 = len*5/6,
> > are sorted and then elements a[e2], a[e4] are taken as pivots.
> >
> > but if I take a[e1] and a[e3] as pivots, it works 2-3% faster on
> > *random* inputs.
> >
> > I tried different sorting for these 5 elements: network, bubble,
> > insertion - with a[e1] and a[e3] pivots it is faster than with
> > a[e2] and a[e4].
> >
> > If I sort these 5 elements in descending order, it works faster
> > with a[e5] and a[e3] pivots.
> >
> > Do you have any idea why it happens?
> >
> > Thank you,
> > Vladimir
More information about the core-libs-dev
mailing list