Extending Java Arrays/Collection Sort API

Wed Dec 5 08:35:09 UTC 2018

John,

Any feedback ?
We could discuss that during next OpenJDK workshop, but I would prefer
going slowly but surely.

Laurent

Le jeu. 29 nov. 2018 à 17:52, Laurent Bourgès <bourges.laurent at gmail.com> a
écrit :

> Hi John & Brian,
>
> Thanks for the proposed approach, I agree we are in the design discussion;
> adding such API enhancements will take time, I said 13 to not hurry for 12
> (CSR...).
>
> John, you wrote me a very detailed letter, I will try to be more concise
> and focus on Array.sort() API.
>
> Le jeu. 29 nov. 2018 à 02:12, John Rose <john.r.rose at oracle.com> a écrit :
>
>>
>> >
>> > In this renderer, I needed a fast sort algorithm on small almost sorted
>> > arrays, but more important, to deal with 2 arrays: x values and their
>> > corresponding edge indices (pointer like). I derived my MergeSort class
>> to
>> > perform 2 swaps (x & edge ptr) as Arrays.sort() or TimSort() did not
>> > provide such API.
>>
>> There are two distinct requirements here:  1. Selecting or tweaking a
>> sorting
>> algorithms for particular expected patterns in the data set
>> (almost-sorted).
>> 2. Sorting two arrays in tandem, with one array supplying the sort key,
>> and the other serving as a passively permuted payload ("PPP").
>>
>> Re 1: Another pattern sometimes expected is range-bounded values,
>> which may prompt consideration of bin-sort.  Perhaps unique values.
>> Permutation vectors have both properties.
>>
>
> Yes, the discussion has 2 points:
> - provide more sorting algorithms in the public API
> - provide new sort variants handling indices
>
> For example, Julia has a simple sort API:
> https://docs.julialang.org/en/v0.6.2/stdlib/sort/
> It provides several basic sort functions (isort, qsort, msort) but the
> general sort() accepts optional custom comparison (less) & transformation
> functions.
> Moreover, another sortperm() function that returns a permutation vector of
> indices, it accepts an optional custom comparison (less).
>
>
>
>>
>> Re 2: In some cases, it's not two arrays that need the work.  Sometimes
>> it is one array of keys, plus an index set; this can be reduced to two
>> arrays
>> one of which is an int-array carrying indexes, but in some proposals you
>> see
>> such an int-array appearing only as an output, with an implicit input of
>> the iota-vector for the other array's indexes.
>>
>> More deeply, sometimes the index array is concrete, but the array of keys
>> is virtualized.  Canonical example (for me) is a set of indexes into a
>> data
>> set such as a file.  (These indexes are dense for compression algorithms
>> like B-W, or could be sparse for lexical analysis of bulk text in situ.)
>> To
>> handle this properly, a good old index array is just fine, as long as the
>> sort algorithm *also* accepts an int-comparator.
>>
>> There is no strong need to generalize to three arrays in tandem, or to
>> have separate algorithms for treating additional arrays as containing
>> secondary keys instead of PPP.  This is because given a two-array tandem
>> sort with PPP, plus an int-array sort with int-comparator, you can compose
>> many other shapes at a modest footprint cost by adding two index vector
>> to control the various sorts.
>>
>
> So you propose 2 new variants:
> - sort(int[]a, int[] b, low, high) where b values are permuted
> - sort(int[]a, low, high, Comparator<int, int>) where b values are permuted
>
> I proposed the 2 array variants: sort( int[] values, int[] indices, low,
> high) as it helps when the user wants both values and indices sorted
> (easier use), but this can be reduced to sort(int[] indices, low, high,
> comparator) as the user can use the permutation vector (indices) to reorder
> values.
>
> I would prefer adding the more general solution sort(int[] indices, low,
> high, comparator) first.
>
> However, having extracted the values is really helpful for performance
> (locality) and simplifies the sorting algorithms but costs more data swaps.
> For example in Marlin, my comparator would be:
> Comparator<int, int>::compare(i,j) {
>     return Integer.compareTo(edges[i].x, edges[j].x);
> }
> However, my edge data are packed in an Unsafe buffer so it is more:
> Comparator<int, int>::compare(i,j) {
>     return Integer.compareTo(Unsafe.getInt(addr0 + i), Unsafe.getInt(addr0
> + j) );
> }
> I wonder if such lookups may be costly (random access + overheads) in
> contrary to having the values given as an extra int[].
>
> If you want, I can prototype later both solutions...
>
>
>>
>> Simplified sketch of generalized tandem array sort:
>>
>> 1. Let L be the common length of the N arrays, A[N][L].
>> 2. Create a new int index array P initialized to iota(L).
>> 3. Create an int-comparator C over 0<=i,j<L which compares A[*][i] to
>> A[*][j].
>> 4. Sort P using C.  Now P is a permutation vector to be applied to each
>> A[*].
>> 5. Create a second index array Q such that Q[P[i]] = i for all i.  Q is
>> P's inverse.
>> 6. Tandem-sort each Q[*] (as a PPP) using a fresh copy of Q for each
>> array.
>>
>> There is more than one way to do this; the above sketch is intended to be
>> straightforward.  The temp. space required is two length-L int arrays.
>>
>
> I like you proposal, but I dislike object allocations so I would prefer
> providing result arrays as arguments if possible.
> In Marlin renderer, allocations are always done in its RendererContext +
> XXXArrayCache (soft ref) to ensure no GC overhead.
>
>
>>
>> My point is that the "missing primitives" are something like a. reordering
>> a PPP using a different array (either indexes or keys) for steering, and
>> b. reordering an index vector (or other array of primitives) using a user
>> supplied comparator.
>>
>> An iota generator would be nice too, as well as a permutation inverter.
>> The tandem sort could be restricted to a "permute this array" operation,
>> giving an index vector, although I don't think this simplifies the
>> implementation,
>> and it removes some use cases.  But those are minor points.
>>
>
> Could you illustrate that aspect ?
>
>
>>
>> > 1. I started discussing about extending the Java Sort API with Vladimir
>> > Yaroslavskiy, the author of the famous Java DualPivotQuickSort, as he is
>> > currently improving it.
>> >
>> > His coming DPQS 2018 class provides lot of sort algorithms, I propose to
>> > make them public API through a facade method in Arrays class:
>> > insertionSort(), heapSort(), mergeSort(), quickSort()... Of course, his
>> > current code is dedicated to DPQS, but his algorithm implementation
>> could
>> > be generalized to propose the Best implementations as a public Java API
>> > (stable & unstable sorts)
>> >
>> > For example in Marlin, I need trivial insertionSort() in the Helpers
>> class,
>> > my MergeSort could be replaced by another best implementation.
>> > Let's not reinvent the wheel…
>>
>> Indeed, let's not.  But remember that the JDK does not claim to be (and
>> cannot be) a collection of all the "wheels" we might need.
>>
>
> I quoted Julia API as it provides only 3 basic algorithms and OpenJDK
> already has such implementations but not publicly exposed. That was my
> first request. Ideally these implementations will be the fastest possible
> (optimized).Moreover, isort() could rely on a new intrinsics to be as fast
> as possible (native code) ?
>
>
>>
>> Here's an extra thought or two on tweakable algorithms:  One algorithm
>> doesn't
>> fit all data sets although it's amazing how far we've gone in optimizing
>> the JDK's
>> only sort algorithm.  The JDK can't support many new algorithms to go
>> with sort
>> (and, ahem, parallelSort), as insertionSort, mergeSort, etc., etc.  Doing
>> this would
>> amount to having a bunch of partially usable sort routines, more like
>> "tryThiseSort"
>> "orTryThisIfItDidNotWork", or "dontUseThisSort".  Actually documenting
>> and testing
>> a multi-kinded API for {insertion,merge,parallel,plain}Sort would be a
>> nightmare,
>> IMO, since the properties of sort algorithms are hard to explain and
>> verify.
>>
>
> That was not my proposal, just few sort implementations that have their
> own interest and are commonly useful within OpenJDK at least.
>
>
>>
>> It would be better if we could define an API for sorting algorithms, that
>> a provider
>> could set up.  Then the various API points in Arrays.*sort(*) could be
>> documented
>> as sugar for specific standard providers, via the API.  The provider API
>> could
>> include not only first-class entry points for sorting various kinds of
>> arrays or
>> tandem array pairs, but also provide queries for a user who wanted to be
>> smart
>> about which provider to choose.
>>
>
> That's a good idea, it looks like julia sort() that accepts an extra
> algorithm argument to select the algorithm to use.
>
>
>>
>> > 2. New Sort API to deal with indices...
>> >
>> > For now, there is no mean to obtain the indices sorted on another value,
>> > see my introduction: edge indices sorted by their x positions.
>> > In data science, this is very useful and general.
>> >
>> > What about an Arrays API extension:
>> > - Arrays.sort(int[] a, int[] b, low, high) or
>> > Indices int[] Arrays.sortIndices(int[] a, low high), a[] left untouched
>>
>> Yes, good.  I don't think sortIndices buys much, since the internal
>> algorithm will almost certainly create an iota vector and sort it instead
>> of a.  I'd prefer an iota function such that this composition works
>> like sortIndices:
>>
>>    int[] indices = indexRange(low, high);
>>    sort(indices, (i, j) -> Integer.compare(a[i], a[j]));
>>    return indices;
>>
>
> I agree, I proposed the extra sortIndices() as an helper method that
> factorize your snippet.
>
>
>>
>> > - same with preallocated buffers & runs arrays
>>
>> I don't follow:  Is this a request for tandem sorting?  Off-heap sorting?
>> I think the above primitives are likely to cover.
>>
>
> I mean preallocated buffers to avoid any allocation inside sort(), to
> avoid GC overhead if the application already handles its own memory
> management (like Marlin) or performs tons of sort() calls. Maybe having a
> SortContext class would help and could be reused accross sort()
> invocations. Efficiency matters here.
>
>
>>
>> (Exception:  Sorting more that 2 billion things is desirable, but will
>> bust out of Java arrays.  That's why I'm not proposing sorting index
>> arrays of type long, at present.  We need an array-like container
>> of more than 2 billion entries before we graduate to long indexes.)
>>
>> > - alternatively, it could be expressed as Comparator<Primitive>
>> > Arrays.sort(int[] a, low, high, Comparator<int, int> cmp) that sorts
>> array
>> > a using the given comparator. For example, a[] could be edge indices
>> sorted
>> > according to their edge x position...
>> > This comparator is specific as it compares anything (external storage)
>> at
>> > given indices i,j.
>>
>> Exactly.  Except note the limit of 2 billion.
>>
>
> Agreed, but Java arrays are suffering this limit for so long ...
>
>
>>
>> > Finally Marlin renderer would benefit from Value types (array of
>> structs),
>> > as edge data are currently packed in int[] arrays, but this new Sort API
>> > seems to me general & needed enough to be discussed here.
>>
>> Yes, value types will give better memory locality; today you have to use
>> tandem sorts.  Also, value types will more naturally work with
>> comparators,
>> where we have to do something different with int-arrays (IntComparator).
>>
>> Valhalla is aiming at generic specialization also.  That, I believe, is
>> the
>> best future for sorting algorithms (and other algorithms too).  A
>> specializable
>> template class could contain *one copy* of a full sort algorithm, which
>> would
>> then be specialized to arrays of a reference, primitive, or value type.
>> The
>> suggestion of a "sort provider API" would mesh nicely with this:  The
>> template
>> class would implement that API.  The index type (currently hardwired as
>> "int")
>> could be a type parameter, instantly releasing us from the 2-billion
>> limit.
>> Tweakable algorithm parameters (such as "likely run size" or "maximum
>> value")
>> could be hardwired into the template (as if by a C #ifdef) simply by
>> defining
>> them as template parameters (a la C++ non-type template parameters).
>>
>
> Looks promising, I should try EA builds ... if ever I have some time ...
> to port Marlin using Value types.
>
>
>>
>> A user-requested specialization of a template algorithm class would be a
>> strong
>> signal to the JVM that some "customized" code needs to be generated.
>> Currently
>> we have no such signals, leading to performance losses when heroic
>> inlining
>> fails to fully connect the user code to the sort algorithm code.  I call
>> this the
>> "loop customization problem".
>>
>> — John
>>
>> P.S. Loop customization problem:
>>
>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2012-September/008381.html
>> http://mail.openjdk.java.net/pipermail/valhalla-dev/2016-June/001953.html
>> Each "loop syndrome" for a sort or BLAS algorithm would be a distinct
>> template class instance.
>>
>> P.P.S. Such a template algorithm class would work naturally with Arrays
>> 2.0,
>> which will (someday, in my dreams!) define template interfaces for
>> extending
>> Java arrays in new directions, notably with index types beyond int.
>>
>> Independently of sort algorithms, I expect an Arrays 2.0 array could be
>> truly
>> 2-D if its index space were pairs of ints (a value type).  It might
>> internally store
>> its data in some blocked cache-friendly form (a square sub-array in each
>> cache
>> line, maybe).  I wonder if there is some useful generalization of
>> linear-ordered
>> sorting which applies to 2-D arrays?  Clearly there are many interesting
>> algorithms
>> on multi-D arrays which could be instantiated and tweaked using template
>> algorithm
>> classes.
>>
>> I am lost, sorry. Let's have this extra discussion in another topic.
>
> Best regards,
> Laurent
>