Extending Java Arrays/Collection Sort API
Laurent Bourgès
bourges.laurent at gmail.com
Wed Nov 28 20:15:06 UTC 2018
Hi John & Brian,
Thank you for your feedback finally, we almost agree Java Sort API could be
improved, in jdk13 possibly.
>
> Doug is right that there is an enormous potential list of sort methods,
> and we can't include them all. But I do miss the ability to extract
> indexes instead of sorting the array itself.
>
> If you read my first email 11.14, I asked if you ever agree exposing few
other sort algorithms to have a public sort API useful on special cases
(isort, radix, merge sorts) when needed to manage specific types of
datasets.
Never mind: third-party libraries can provide advanced / other algorithms.
Vladimir Yaroslavskiy made huge progresses on his DPQS 18.11 and I hope it
will be the overall winner on almost all data distributions at all scale
soon to provide a better Arrays.sort() implementation soon.
> Or, slightly more generally, sorting an int[] or long[] array with a
> comparator.
> Sometimes you don't want an object per sorted entity; you want some kind
> of oracle function that can compare two indexes into a non-concrete
> (virtualized) set of values; the sort operation would output the reordered
> ints.
>
> Example: Sorting indexes into a large text file according to some
> algorithm
> such as Burrows-Wheeler. You don't want to make all the substrings and
> put them into a String[] array, and you don't even want to make
> CharSequence
> objects that view into the text file. You just want indexes into the text
> file to
> be sorted.
>
> IMO, the very simplest answer with significantly better semantics is:
>
> void sort(int[] a, (int[] a, int fromIndex, int toIndex,
> IntBinaryOperator cmp);
>
> And maybe add parallelSort, and use a new IntComparator instead of
> repurposing IntBinaryOperator.
>
> Extracting the permutation vector of indexes from an array sorting
> operation
> would then look like this:
>
> public static <T> int[] sortToIndexes(T[] a, Comparator<? super T> c) {
> int[] perm = new int[a.length]; // starts as an iota vector
> for (int i = 0; i < a.length; i++) perm[i] = i;
> sort(perm, 0, a.length, (i, j) -> c.compare(a[i], a[j])); // NEW
> PRIMITIVE
> return perm;
> }
>
> But there are other use cases for comparator-based sorting of indexes,
> which start not with an iota vector, but with an array of index-like values
> which may index into something other than an array (like a text file,
> which is why 'long[] a' might be useful too).
>
> In Valhalla many such use cases can be covered by using a flat array of
> values which encapsulate the integer (or long) indexes, and sorting that.
> The array of wrapped primitives will look like Object[] and so the existing
> API point with a comparator would allow the flat array of ints (dressed as
> value Objects) to be sorted according to any ad hoc ordering rule,
> including
> treating them as indexes.
>
> So maybe the answer is "wait for Valhalla". Still, the lack of a
> comparator
> on primitive arrays is an unnecessary limitation, which excludes
> index-based
> computations from the portfolio of our sort methods.
>
1. I am waiting for Value types for several years as Marlin renderer is
already using array of structs in java, ie edge data are packed in int[]
and unsafe buffers, to avoid bound checks (like C). It is a good candidate
to evaluate this new feature.
I can help here, but not alone.
2. John, your proposal matches mine:
"alternatively, it could be expressed as Comparator<Primitive>
Arrays.sort(int[] a, low, high, Comparator<int, int> cmp) that sorts array
a using the given comparator. For example, a[] could be edge indices sorted
according to their edge x position...
This comparator is specific as it compares anything (external storage) at
given indices i,j."
Let's go for 13...
If I may insist a bit on the 2 arrays variant, Marlin needs the x array
sorted for future traversal...
I do not know what is faster: sorting 2 arrays (indices swapped according
to x array order: 2x times swaps) or using a comparator (lookups to compare
(x[i], x[j] ) + x array reconstruction).
Finally how can we manage such improvement in 13? CSR / RFE ?
John or Brian, could you lead that specification request (CSR) and formal
process ?
I can propose implementations on my side, I already have a working DPQS
18.11 with 2 arrays on Marlin's github.
Regards,
Laurent
More information about the core-libs-dev
mailing list