Patch to improve primitives Array.sort()

Paul Sandoz paul.sandoz at oracle.com
Fri Apr 24 08:57:01 UTC 2015


See here:

  http://cr.openjdk.java.net/~psandoz/tmp/gs/sort/webrev/

Some very quick comments as i have not yet had time to review more closely:

- IANAL so i dunno about the GS copyright in the files.

- The constant MAX_RUN_LENGTH is no longer used so could be removed. But i would like to understand why it's no longer required.

- There is quite a bit of duplication in the tests. AFAICT data sources are all derived from ints that are then converted. The sources could be data providers, so only one test method per data type is required, each data can come with a descriptive string so it shows up in the test reports. The goal here being if another source of data is added (which is derivable) it could be added just once.

Paul.


On Apr 24, 2015, at 9:17 AM, "Chan, Sunny" <Sunny.Chan at gs.com> wrote:

> We are proposing a patch to improve the performance for the DualPivotQuickSort use by Array.sort to sort primitives array. We have identified two area for optimization:
> 
> Firstly, we have changed the algorithm to determine what a "run" is. A "run" is how long you go through the array with it being in order. For example, an array of 1, 2, 3, 5, 4, 3 would have a run count equal to 2 (two parts that are in order).
> 
> The JDK implementation stops looking for runs if it comes across two equal numbers at the beginning of a run, eg.
> 1, 1, 1, 1, 2, 3, 4, 4, 4, 4, 4, 4, 5
> And then sorts the whole array, as seen in this part of the algorithm:
> } else { // equal
>    for (int m = MAX_RUN_LENGTH; ++k <= right && a[k - 1] == a[k]; ) {
>        if (--m == 0) {
>            sort(a, left, right, true);
>            return;
>        }
>    }
> }
> 
> Instead, we detect and equal beginning of a run at the top of the for loop, as shown here:
> while (k < right && a[k] == a[++k]); //equal
> 
> so the array mentioned above would instead have one run, be determined already sorted, and sort would not be called as it would in the original JDK implementation.
> 
> Second optimization is in the case of an array that is descending, and then ascending.
> 
> Since the algorithm does swapping in the case of descending numbers, by the end of all swapping the array is sorted. We detect this through this if statement:
> if (run[count] > left && a[run[count]] >= a[run[count] - 1]) {
>    count--;
> }
> And will stop sorting, unlike the JDK.
> 
> I have attached a webrev.zip containing the patch and unit test that verifies the sort is correct. We also have a couple of JMH based performance tests which is not included. If the JMH is available we can also make those performance test available as well.
> 
> The patch is author by Kristen O'Leary (Kristen.o'leary at gs.com) and myself. Please attribute both of us when committing. You can find my OCA entry under Goldman Sachs and as we are not authors yet we can't raise a bug report in the database.
> 
> Please let us know if you need further clarification or modification to the patch.
> 
> Sunny Chan, Executive Director, Technology
> Goldman Sachs (Asia) L.L.C. | 68th Floor | Cheung Kong Center | 2 Queens Road Central | Hong Kong
> email:  sunny.chan at gs.com | Tel: +852 2978 6481 | Fax: +852 2978 0633
> 
> This message may contain information that is confidential or privileged.  If you are not the intended recipient, please advise the sender immediately and delete this message.  See http://www.gs.com/disclaimer/email for further information on confidentiality and the risks inherent in electronic communication.
> 




More information about the core-libs-dev mailing list