New portion of improvements for Dual-Pivot Quicksort

Mon Apr 26 21:50:08 UTC 2010

Hello, everyone!

I've investigated the implementation of the Dual-Pivot Quicksort
which is used for sorting primitives and here is my result:

http://cr.openjdk.java.net/~alanb/6947216/webrev.00

New implementation of Dual-Pivot Quicksort is faster
than previous one of 12% for client VM and few percents for
server VM. Tests on Bentley's test suite (Win XP, JDK 7,
build 1.7.0-ea-b84, n = 1000000) show geometry mean 0.88
for client VM and 0.98-1.00 for server VM.

In compare with sorting from JDK 6 by Jon L. Bentley and
M. Douglas McIlroy's (with one pivot) the ratio is 0.61 / 0.50
(client / server).

See the execution time for sorting array of 2`000`000 int elements
50 times, client / server VM, in milliseconds:

random
 new: 16723 18776
jdk7: 17571 18975
jdk6: 22241 26524

ascendant
 new:  3541  4702
jdk7:  4486  4669
jdk6:  8667  7978

descendant
 new:  3854  4907
jdk7:  4942  5034
jdk6:  8787  8243

equal
 new:   234   281
jdk7:   291   230
jdk6:   602  1018

organ pipes
 new:  7673  8613
jdk7:  8167  8993
jdk6: 11902 14044

stagger 1
 new:  7648  8591
jdk7:  8161  8979
jdk6: 11908 13810

stagger 2
 new:  8349  9299
jdk7: 10968 11916
jdk6: 12194 14734

stagger 4
 new:  8475  9622
jdk7:  9221  9682
jdk6: 10523 12006

stagger 8
 new:  9321 10689
jdk7: 11125 12387
jdk6: 13829 16214

period 1..2
 new:   758   751
jdk7:   870   754
jdk6:  1038  1227

period 1..4
 new:  1004   963
jdk7:  1365  1209
jdk6:  1511  1847

period 1..8
 new:  1588  1573
jdk7:  1599  1790
jdk6:  2602  3045

random 1..2
 new:  1327  1125
jdk7:  1362  1496
jdk6:  1531  2182

random 1..4
 new:  1830  2118
jdk7:  1851  2236
jdk6:  2292  3025

where stagger(m) is array like a[i] = i * (m + 1) % length.

The summary of changes is:

1. For sorting small arrays is used insertion sort with sentinel
instead of traditional, which has the structure:

for (int i = left + 1; i <= right; i++) {
    for (j = i; j > left && a[j-1] > a[j]; j--) {
        swap(a[i], a[j-1]);
    }
}

Note that range check j > left is performed on each iteration,
but really helps very rare. To avoid this expensive range check,
it was suggested to set minimum value (negative infinity) on the
first position. This type of suggestion is used in new version:

if left bound > 0, we can put sentinel on a[left - 1], do insertion
sort without expensive check range, and then restore a[left - 1]
value. If left == 0, traditional insertion sort is used. Please,
look at the webrev for details.

2. In previous implementation 5 evenly spaced elements

sixth = length / 6;
a[sixth], a[2 * sixth], a[3 * sixth], a[4 * sixth], a[5 * sixth]

were used as candidates of pivots elements. This case is very
sensitive for period inputs, especially by 6. The new suggestion
is to take 5 center evenly spaced elements like this:

int seventh = length / 7;
int midpoint = (left + right) >>> 1;

a[midpoint - 2 * seventh], a[midpoint -     seventh], a[midpoint],
a[midpoint +     seventh], a[midpoint + 2 * seventh]

and moreover, the seventh is calculated inexpensively:
seventh = (length >>> 3) + (length >>> 6) + 1;

This schema works the same on random, ascendant, descendant, equal
inputs, but much better for period / stagger.

3. The whole structure

<choose pivots>

if (pivotsDiffer) {
    <do partitioning for two pivots>
}
else {
    <do partitioning for one pivot>
}

<sort left and right parts>

if (!pivotsDiffer) {
    return;
}
<swap internal pivot values to ends>

<sort center part>

was modified to:
----------------

<choose pivots>

if (pivot1 < pivot2) {
    <do partitioning for two pivots>
    <swap pivots into their final positions>
    <sort left and right parts>
    <swap internal pivot values to ends>
    <sort center part>
}
else {
    <do partitioning for one pivot>
    <sort left and right parts>
}

4. Partitioning for both cases have not been changed at all.

5. Minor javadoc and format changes.

Please, review new implementation,
any comments / suggestions are welcome!

Thank you,
Vladimir