Dual-Pivot Quicksort improvements for highly structured (nearly sorted) arrays and data with small periods

Fri Feb 4 18:48:10 UTC 2011

This looks like really good work. I have a few comments and questions about this webrev. None of these comments block commit.

- The @version tag contains what appears to be a value from a version control system. I thought all of these had been or were being eliminated.

- I am curious how the MAX_RUN_COUNT, MAX_RUN_LENGTH and THRESHOLD values are derived. We should provide some back link to how these values were chosen at so that future maintainers can decided when it is appropriate (or not) to change the values.

- Are the seven implementations generated from a template or are they hand-coded? I didn't check that there are no unexplained differences between the implementation but am curious if it would be possible that differences could arise or exist.

Mike

On Jan 20 2011, at 10:09 , Vladimir Iaroslavski wrote:

> Hello,
> 
> Here is improvement for sorting primitives: 
> http://cr.openjdk.java.net/~alanb/7013585/webrev
> 
> Highly structured (nearly sorted) data like
> 1,2,3,..,k,1,2,3,..,k,... or
> 1,2,3,...,n/2,n/2,...,3,2,1 etc.
> is sorted very effectively by merge sort.
> 
> The main idea of this optimization is to check
> if given array is nearly sorted and apply modified
> and improved merge sort on it. Otherwise, Dual-Pivot
> Quicksort is applied.
> 
> Note that the check doesn't decrease the performance of sorting.
> 
> Other optimization is related to random or repeated data
> with small period, m <= 10. This type of data is sorted
> faster by Quicksort with one pivot but with DNF (Dutch
> National Flag) partition. Therefore, sorting is switched
> to DNF even calculated pivots are different.
> 
> The boost of performance with new optimizations is:
> 
> random data: new: 135, old: 154, ratio: 87.6%
> ascending: new: 3, old: 45, ratio: 6.6%
> descending: new: 5, old: 48, ratio: 10.4%
> organ pipes: new: 20, old: 80, ratio: 25%
> period(m), m <= 10: new: 10, old: 15, ratio: 66.6%
> random(m), m <= 10: new: 11, old: 16, ratio: 68.7%
> equal: new: 2.5, old: 3, new: 83.3%
> 
> On test suite suggested by Jon Bentley we see
> the following performance:
> 
> client VM: old 56%, new 43%
> server VM: old 46%, new 29%
> 
> Thank you,
> Vladimir