Extending Java Arrays/Collection Sort API

Laurent Bourgès bourges.laurent at gmail.com
Tue Nov 20 13:53:32 UTC 2018


Hi Doug,
That's a pleasure to read you.

Le mar. 20 nov. 2018 à 13:17, Doug Lea <dl at cs.oswego.edu> a écrit :

> On 11/20/18 2:50 AM, Laurent Bourgès wrote:
> > Hi,
> > Any feedback on improving Java Sort API ?
>
> I think most considerations await progress on value types. I posted some
> ideas and links to code last summer on Valhalla list:
> http://mail.openjdk.java.net/pipermail/valhalla-dev/2018-August/thread.html


I already followed this thread and I am looking for Value types for long.
I looked at your CVSorter class and it looks promising, I can try adapting
it to my needs and compare its own performance in my Sort Race 2018.

Even if Value types are coming, the Arrays/Collections sort() API will
remain and must still evolve to provide the best (overall) sort.
As said previously, adaptive algorithms already embed several Sort
algorithms (insertion sort, merge sort, dual pivot quick sort, radix sort)
so I proposed to wrap them as public API to let the developper select the
best sort according to its needs or data sets.


> They didn't include APIs for sorting based on indexes, which I agree
> should be considered. However, until value types arrive, it would be
> hard to justify adding variants for each primitive type.
>

Thanks, you agree it is useful. For now I only focused on sort(int[] a,
int[] indices, low, high) for my own needs.
As Java arrays are still only integer based, I propose to only support
int[] indices.
I already implemented such method in my DualPivotQuicksort201802Ext (that
uses pre-allocation):
https://github.com/bourgesl/nearly-optimal-mergesort-code/blob/master/src/edu/sorting/DualPivotQuicksort201802Ext.java

static void sort(int[] a, int[] b, int low, int high, int[] auxA, int[] auxB,
int[] run) { ... }
note: int[] auxA, int[] auxB, int[] run are given for pre-allocation
purposes


>
> Almost every sorting algorithm is best for some data types and inputs.
> And there are a lot of algorithms. Among the hard choices for JDK is to
> pick a small a only a few (stable vs nonstable, parallelizable, etc),
> that together provide good choices for most usages. Even then, some
> usages (possibly including the one in your initial post) might be better
> off creating a custom solution. (As is true for all JDK
> Collections/Array-related components.)
>

So is my proposal: Arrays/Collection sort() must be the overall winner (all
data distributions, all array lengths) ~ best compromise.

For particular needs, the JDK or a sort library could provide specific sort
implementations.
To that purpose, I have collected 15 sort (working) implementations:
https://github.com/bourgesl/nearly-optimal-mergesort-code/blob/master/src/edu/sorting/perf/IntSorter.java

Best regards,
Laurent


> >     What about an Arrays API extension:
> >     - Arrays.sort(int[] a, int[] b, low, high) or
> >     Indices int[] Arrays.sortIndices(int[] a, low high), a[] left
> untouched
> >     - same with preallocated buffers & runs arrays
> >     - alternatively, it could be expressed as Comparator<Primitive>
> >     Arrays.sort(int[] a, low, high, Comparator<int, int> cmp) that sorts
> >     array a using the given comparator. For example, a[] could be edge
> >     indices sorted according to their edge x position...
> >     This comparator is specific as it compares anything (external
> >     storage) at given indices i,j.
> >
>
>


More information about the core-libs-dev mailing list