Extending Java Arrays/Collection Sort API

Thu Nov 29 16:52:15 UTC 2018

Hi John & Brian,

Thanks for the proposed approach, I agree we are in the design discussion;
adding such API enhancements will take time, I said 13 to not hurry for 12
(CSR...).

John, you wrote me a very detailed letter, I will try to be more concise
and focus on Array.sort() API.

Le jeu. 29 nov. 2018 à 02:12, John Rose <john.r.rose at oracle.com> a écrit :

>
> >
> > In this renderer, I needed a fast sort algorithm on small almost sorted
> > arrays, but more important, to deal with 2 arrays: x values and their
> > corresponding edge indices (pointer like). I derived my MergeSort class
> to
> > perform 2 swaps (x & edge ptr) as Arrays.sort() or TimSort() did not
> > provide such API.
>
> There are two distinct requirements here:  1. Selecting or tweaking a
> sorting
> algorithms for particular expected patterns in the data set
> (almost-sorted).
> 2. Sorting two arrays in tandem, with one array supplying the sort key,
> and the other serving as a passively permuted payload ("PPP").
>
> Re 1: Another pattern sometimes expected is range-bounded values,
> which may prompt consideration of bin-sort.  Perhaps unique values.
> Permutation vectors have both properties.
>

Yes, the discussion has 2 points:
- provide more sorting algorithms in the public API
- provide new sort variants handling indices

For example, Julia has a simple sort API:
https://docs.julialang.org/en/v0.6.2/stdlib/sort/
It provides several basic sort functions (isort, qsort, msort) but the
general sort() accepts optional custom comparison (less) & transformation
functions.
Moreover, another sortperm() function that returns a permutation vector of
indices, it accepts an optional custom comparison (less).

>
> Re 2: In some cases, it's not two arrays that need the work.  Sometimes
> it is one array of keys, plus an index set; this can be reduced to two
> arrays
> one of which is an int-array carrying indexes, but in some proposals you
> see
> such an int-array appearing only as an output, with an implicit input of
> the iota-vector for the other array's indexes.
>
> More deeply, sometimes the index array is concrete, but the array of keys
> is virtualized.  Canonical example (for me) is a set of indexes into a data
> set such as a file.  (These indexes are dense for compression algorithms
> like B-W, or could be sparse for lexical analysis of bulk text in situ.)
> To
> handle this properly, a good old index array is just fine, as long as the
> sort algorithm *also* accepts an int-comparator.
>
> There is no strong need to generalize to three arrays in tandem, or to
> have separate algorithms for treating additional arrays as containing
> secondary keys instead of PPP.  This is because given a two-array tandem
> sort with PPP, plus an int-array sort with int-comparator, you can compose
> many other shapes at a modest footprint cost by adding two index vector
> to control the various sorts.
>

So you propose 2 new variants:
- sort(int[]a, int[] b, low, high) where b values are permuted
- sort(int[]a, low, high, Comparator<int, int>) where b values are permuted

I proposed the 2 array variants: sort( int[] values, int[] indices, low,
high) as it helps when the user wants both values and indices sorted
(easier use), but this can be reduced to sort(int[] indices, low, high,
comparator) as the user can use the permutation vector (indices) to reorder
values.

I would prefer adding the more general solution sort(int[] indices, low,
high, comparator) first.

However, having extracted the values is really helpful for performance
(locality) and simplifies the sorting algorithms but costs more data swaps.
For example in Marlin, my comparator would be:
Comparator<int, int>::compare(i,j) {
    return Integer.compareTo(edges[i].x, edges[j].x);
}
However, my edge data are packed in an Unsafe buffer so it is more:
Comparator<int, int>::compare(i,j) {
    return Integer.compareTo(Unsafe.getInt(addr0 + i), Unsafe.getInt(addr0
+ j) );
}
I wonder if such lookups may be costly (random access + overheads) in
contrary to having the values given as an extra int[].

If you want, I can prototype later both solutions...

>
> Simplified sketch of generalized tandem array sort:
>
> 1. Let L be the common length of the N arrays, A[N][L].
> 2. Create a new int index array P initialized to iota(L).
> 3. Create an int-comparator C over 0<=i,j<L which compares A[*][i] to
> A[*][j].
> 4. Sort P using C.  Now P is a permutation vector to be applied to each
> A[*].
> 5. Create a second index array Q such that Q[P[i]] = i for all i.  Q is
> P's inverse.
> 6. Tandem-sort each Q[*] (as a PPP) using a fresh copy of Q for each array.
>
> There is more than one way to do this; the above sketch is intended to be
> straightforward.  The temp. space required is two length-L int arrays.
>

I like you proposal, but I dislike object allocations so I would prefer
providing result arrays as arguments if possible.
In Marlin renderer, allocations are always done in its RendererContext +
XXXArrayCache (soft ref) to ensure no GC overhead.

>
> My point is that the "missing primitives" are something like a. reordering
> a PPP using a different array (either indexes or keys) for steering, and
> b. reordering an index vector (or other array of primitives) using a user
> supplied comparator.
>
> An iota generator would be nice too, as well as a permutation inverter.
> The tandem sort could be restricted to a "permute this array" operation,
> giving an index vector, although I don't think this simplifies the
> implementation,
> and it removes some use cases.  But those are minor points.
>

Could you illustrate that aspect ?

>
> > 1. I started discussing about extending the Java Sort API with Vladimir
> > Yaroslavskiy, the author of the famous Java DualPivotQuickSort, as he is
> > currently improving it.
> >
> > His coming DPQS 2018 class provides lot of sort algorithms, I propose to
> > make them public API through a facade method in Arrays class:
> > insertionSort(), heapSort(), mergeSort(), quickSort()... Of course, his
> > current code is dedicated to DPQS, but his algorithm implementation could
> > be generalized to propose the Best implementations as a public Java API
> > (stable & unstable sorts)
> >
> > For example in Marlin, I need trivial insertionSort() in the Helpers
> class,
> > my MergeSort could be replaced by another best implementation.
> > Let's not reinvent the wheel…
>
> Indeed, let's not.  But remember that the JDK does not claim to be (and
> cannot be) a collection of all the "wheels" we might need.
>

I quoted Julia API as it provides only 3 basic algorithms and OpenJDK
already has such implementations but not publicly exposed. That was my
first request. Ideally these implementations will be the fastest possible
(optimized).Moreover, isort() could rely on a new intrinsics to be as fast
as possible (native code) ?

>
> Here's an extra thought or two on tweakable algorithms:  One algorithm
> doesn't
> fit all data sets although it's amazing how far we've gone in optimizing
> the JDK's
> only sort algorithm.  The JDK can't support many new algorithms to go with
> sort
> (and, ahem, parallelSort), as insertionSort, mergeSort, etc., etc.  Doing
> this would
> amount to having a bunch of partially usable sort routines, more like
> "tryThiseSort"
> "orTryThisIfItDidNotWork", or "dontUseThisSort".  Actually documenting and
> testing
> a multi-kinded API for {insertion,merge,parallel,plain}Sort would be a
> nightmare,
> IMO, since the properties of sort algorithms are hard to explain and
> verify.
>

That was not my proposal, just few sort implementations that have their own
interest and are commonly useful within OpenJDK at least.

>
> It would be better if we could define an API for sorting algorithms, that
> a provider
> could set up.  Then the various API points in Arrays.*sort(*) could be
> documented
> as sugar for specific standard providers, via the API.  The provider API
> could
> include not only first-class entry points for sorting various kinds of
> arrays or
> tandem array pairs, but also provide queries for a user who wanted to be
> smart
> about which provider to choose.
>

That's a good idea, it looks like julia sort() that accepts an extra
algorithm argument to select the algorithm to use.

>
> > 2. New Sort API to deal with indices...
> >
> > For now, there is no mean to obtain the indices sorted on another value,
> > see my introduction: edge indices sorted by their x positions.
> > In data science, this is very useful and general.
> >
> > What about an Arrays API extension:
> > - Arrays.sort(int[] a, int[] b, low, high) or
> > Indices int[] Arrays.sortIndices(int[] a, low high), a[] left untouched
>
> Yes, good.  I don't think sortIndices buys much, since the internal
> algorithm will almost certainly create an iota vector and sort it instead
> of a.  I'd prefer an iota function such that this composition works
> like sortIndices:
>
>    int[] indices = indexRange(low, high);
>    sort(indices, (i, j) -> Integer.compare(a[i], a[j]));
>    return indices;
>

I agree, I proposed the extra sortIndices() as an helper method that
factorize your snippet.

>
> > - same with preallocated buffers & runs arrays
>
> I don't follow:  Is this a request for tandem sorting?  Off-heap sorting?
> I think the above primitives are likely to cover.
>

I mean preallocated buffers to avoid any allocation inside sort(), to avoid
GC overhead if the application already handles its own memory management
(like Marlin) or performs tons of sort() calls. Maybe having a SortContext
class would help and could be reused accross sort() invocations. Efficiency
matters here.

>
> (Exception:  Sorting more that 2 billion things is desirable, but will
> bust out of Java arrays.  That's why I'm not proposing sorting index
> arrays of type long, at present.  We need an array-like container
> of more than 2 billion entries before we graduate to long indexes.)
>
> > - alternatively, it could be expressed as Comparator<Primitive>
> > Arrays.sort(int[] a, low, high, Comparator<int, int> cmp) that sorts
> array
> > a using the given comparator. For example, a[] could be edge indices
> sorted
> > according to their edge x position...
> > This comparator is specific as it compares anything (external storage) at
> > given indices i,j.
>
> Exactly.  Except note the limit of 2 billion.
>

Agreed, but Java arrays are suffering this limit for so long ...

>
> > Finally Marlin renderer would benefit from Value types (array of
> structs),
> > as edge data are currently packed in int[] arrays, but this new Sort API
> > seems to me general & needed enough to be discussed here.
>
> Yes, value types will give better memory locality; today you have to use
> tandem sorts.  Also, value types will more naturally work with comparators,
> where we have to do something different with int-arrays (IntComparator).
>
> Valhalla is aiming at generic specialization also.  That, I believe, is the
> best future for sorting algorithms (and other algorithms too).  A
> specializable
> template class could contain *one copy* of a full sort algorithm, which
> would
> then be specialized to arrays of a reference, primitive, or value type.
> The
> suggestion of a "sort provider API" would mesh nicely with this:  The
> template
> class would implement that API.  The index type (currently hardwired as
> "int")
> could be a type parameter, instantly releasing us from the 2-billion limit.
> Tweakable algorithm parameters (such as "likely run size" or "maximum
> value")
> could be hardwired into the template (as if by a C #ifdef) simply by
> defining
> them as template parameters (a la C++ non-type template parameters).
>

Looks promising, I should try EA builds ... if ever I have some time ... to
port Marlin using Value types.

>
> A user-requested specialization of a template algorithm class would be a
> strong
> signal to the JVM that some "customized" code needs to be generated.
> Currently
> we have no such signals, leading to performance losses when heroic inlining
> fails to fully connect the user code to the sort algorithm code.  I call
> this the
> "loop customization problem".
>
> — John
>
> P.S. Loop customization problem:
>
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2012-September/008381.html
> http://mail.openjdk.java.net/pipermail/valhalla-dev/2016-June/001953.html
> Each "loop syndrome" for a sort or BLAS algorithm would be a distinct
> template class instance.
>
> P.P.S. Such a template algorithm class would work naturally with Arrays
> 2.0,
> which will (someday, in my dreams!) define template interfaces for
> extending
> Java arrays in new directions, notably with index types beyond int.
>
> Independently of sort algorithms, I expect an Arrays 2.0 array could be
> truly
> 2-D if its index space were pairs of ints (a value type).  It might
> internally store
> its data in some blocked cache-friendly form (a square sub-array in each
> cache
> line, maybe).  I wonder if there is some useful generalization of
> linear-ordered
> sorting which applies to 2-D arrays?  Clearly there are many interesting
> algorithms
> on multi-D arrays which could be instantiated and tweaked using template
> algorithm
> classes.
>
> I am lost, sorry. Let's have this extra discussion in another topic.

Best regards,
Laurent