Performance model of sorted() ?

Mon Nov 4 01:13:08 PST 2013

On Nov 1, 2013, at 7:18 PM, Brian Goetz <brian.goetz at Oracle.COM> wrote:

> The "slicing" part is right, but the single-threaded assumption is wrong.
> 
> What happens is that the upstream stateless ops will be done in 
> parallel, with the results being collected at the leaves of the 
> computation tree, into a bunch of mini-buffers.  Now, each will be 
> sorted (in parallel), and then we begin a parallel merge, collecting the 
> results into an array.  Then the downstream stateless ops and the 
> terminal op are run, in parallel.
> 

That is what we wanted/planned to do :-) but for reasons of expediency we currently resort to collecting, in parallel, to an array and then using Arrays.parallelSort e.g. the equivalent of:

  Stream s = ...
  s = s.parallel();
  Object[] a = s.toArray();
  Arrays.parallelSort(a);
  s = Arrays.stream(a);

So there is some redundancy and Arrays.parallelSort will not go parallel if the size of the array is less than 2^13.

Unfortunately it will be tricky to share any code from Arrays.parallelSort, thus requiring duplication x 4 (for refs + primitives). (Arrays.parallelArray splits into quarters to ensure the final sort is in the main array but in our case i don't think we require that.)

For an optimal implementation that computation tree needs to consist of nodes that are F/J tasks (such that collection and merging can happen concurrently) and we need to support sized and non-sized pipelines slightly differently for the collection phase. Given that we can use Arrays.parallelArray code as a template i think most of the hard work is done.

Paul.

> So, no truly sequential steps imposed by the framework (though binary 
> parallel merges are generally dominated by the last merge step.)