New portion of improvements for Dual-Pivot Quicksort

Thu Apr 29 11:05:47 UTC 2010

Hello,

Thank you for pointing to this check, I have run test again and it
works little it faster
(this check was necessary for old version of insertion sort):

60.90% / 48.35% instead of 61.00% /  51.00% (client / server).

I attached new version of the class (+ minor javdoc changes).

Best regards,
Vladimir

On Thu, Apr 29, 2010 at 12:18 PM, Goktug Gokdogan <gokdogan at gmail.com> wrote:
> One quick comment: you can consider moving trivial array check so it will be
> executed only if insertion sort threshold check passes:
>
> if (length < INSERTION_SORT_THRESHOLD) {
>   if (length < 2) {
>     return;
>   }
>   ....
>
> On Mon, Apr 26, 2010 at 2:50 PM, Vladimir Iaroslavski
> <vladimir.iaroslavski at googlemail.com> wrote:
>>
>> Hello, everyone!
>>
>> I've investigated the implementation of the Dual-Pivot Quicksort
>> which is used for sorting primitives and here is my result:
>>
>> http://cr.openjdk.java.net/~alanb/6947216/webrev.00
>>
>> New implementation of Dual-Pivot Quicksort is faster
>> than previous one of 12% for client VM and few percents for
>> server VM. Tests on Bentley's test suite (Win XP, JDK 7,
>> build 1.7.0-ea-b84, n = 1000000) show geometry mean 0.88
>> for client VM and 0.98-1.00 for server VM.
>>
>> In compare with sorting from JDK 6 by Jon L. Bentley and
>> M. Douglas McIlroy's (with one pivot) the ratio is 0.61 / 0.50
>> (client / server).
>>
>> See the execution time for sorting array of 2`000`000 int elements
>> 50 times, client / server VM, in milliseconds:
>>
>> random
>>  new: 16723 18776
>> jdk7: 17571 18975
>> jdk6: 22241 26524
>>
>> ascendant
>>  new:  3541  4702
>> jdk7:  4486  4669
>> jdk6:  8667  7978
>>
>> descendant
>>  new:  3854  4907
>> jdk7:  4942  5034
>> jdk6:  8787  8243
>>
>> equal
>>  new:   234   281
>> jdk7:   291   230
>> jdk6:   602  1018
>>
>> organ pipes
>>  new:  7673  8613
>> jdk7:  8167  8993
>> jdk6: 11902 14044
>>
>> stagger 1
>>  new:  7648  8591
>> jdk7:  8161  8979
>> jdk6: 11908 13810
>>
>> stagger 2
>>  new:  8349  9299
>> jdk7: 10968 11916
>> jdk6: 12194 14734
>>
>> stagger 4
>>  new:  8475  9622
>> jdk7:  9221  9682
>> jdk6: 10523 12006
>>
>> stagger 8
>>  new:  9321 10689
>> jdk7: 11125 12387
>> jdk6: 13829 16214
>>
>> period 1..2
>>  new:   758   751
>> jdk7:   870   754
>> jdk6:  1038  1227
>>
>> period 1..4
>>  new:  1004   963
>> jdk7:  1365  1209
>> jdk6:  1511  1847
>>
>> period 1..8
>>  new:  1588  1573
>> jdk7:  1599  1790
>> jdk6:  2602  3045
>>
>> random 1..2
>>  new:  1327  1125
>> jdk7:  1362  1496
>> jdk6:  1531  2182
>>
>> random 1..4
>>  new:  1830  2118
>> jdk7:  1851  2236
>> jdk6:  2292  3025
>>
>> where stagger(m) is array like a[i] = i * (m + 1) % length.
>>
>> The summary of changes is:
>>
>> 1. For sorting small arrays is used insertion sort with sentinel
>> instead of traditional, which has the structure:
>>
>> for (int i = left + 1; i <= right; i++) {
>>    for (j = i; j > left && a[j-1] > a[j]; j--) {
>>        swap(a[i], a[j-1]);
>>    }
>> }
>>
>> Note that range check j > left is performed on each iteration,
>> but really helps very rare. To avoid this expensive range check,
>> it was suggested to set minimum value (negative infinity) on the
>> first position. This type of suggestion is used in new version:
>>
>> if left bound > 0, we can put sentinel on a[left - 1], do insertion
>> sort without expensive check range, and then restore a[left - 1]
>> value. If left == 0, traditional insertion sort is used. Please,
>> look at the webrev for details.
>>
>> 2. In previous implementation 5 evenly spaced elements
>>
>> sixth = length / 6;
>> a[sixth], a[2 * sixth], a[3 * sixth], a[4 * sixth], a[5 * sixth]
>>
>> were used as candidates of pivots elements. This case is very
>> sensitive for period inputs, especially by 6. The new suggestion
>> is to take 5 center evenly spaced elements like this:
>>
>> int seventh = length / 7;
>> int midpoint = (left + right) >>> 1;
>>
>> a[midpoint - 2 * seventh], a[midpoint -     seventh], a[midpoint],
>> a[midpoint +     seventh], a[midpoint + 2 * seventh]
>>
>> and moreover, the seventh is calculated inexpensively:
>> seventh = (length >>> 3) + (length >>> 6) + 1;
>>
>> This schema works the same on random, ascendant, descendant, equal
>> inputs, but much better for period / stagger.
>>
>> 3. The whole structure
>>
>> <choose pivots>
>>
>> if (pivotsDiffer) {
>>    <do partitioning for two pivots>
>> }
>> else {
>>    <do partitioning for one pivot>
>> }
>>
>> <sort left and right parts>
>>
>> if (!pivotsDiffer) {
>>    return;
>> }
>> <swap internal pivot values to ends>
>>
>> <sort center part>
>>
>> was modified to:
>> ----------------
>>
>> <choose pivots>
>>
>> if (pivot1 < pivot2) {
>>    <do partitioning for two pivots>
>>    <swap pivots into their final positions>
>>    <sort left and right parts>
>>    <swap internal pivot values to ends>
>>    <sort center part>
>> }
>> else {
>>    <do partitioning for one pivot>
>>    <sort left and right parts>
>> }
>>
>> 4. Partitioning for both cases have not been changed at all.
>>
>> 5. Minor javadoc and format changes.
>>
>> Please, review new implementation,
>> any comments / suggestions are welcome!
>>
>> Thank you,
>> Vladimir
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DualPivotQuicksort.java
Type: application/octet-stream
Size: 97083 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20100429/2fd27a4a/DualPivotQuicksort.java>