Spliterator

Brian Goetz brian.goetz at oracle.com
Mon Nov 19 14:17:54 PST 2012


The purpose of getSizeIfKnown is to eliminate copies.  If the leaf size 
is known, we can allocate the right size at each leaf rather than 
guessing; if all the sizes through the tree are known, then we can 
optimize things like:

   collection.stream().map(...).toArray();

so there is only a source array and a target array and the mapping 
happens in parallel into the right offset within the final target array, 
rather than allocating chunks for each leaf and merging them.  For the 
cases where this is possible, this could be a 2x performance difference. 
  But, this is tied to the non-interference requirement, that the source 
collection not change during the computation.

The missing bit of the spec, as Doug alludes to, is in the 
Stream-construction methods, where the non-interference constraint 
lives.  We should probably point from the Spliterator spec to those, but 
I didn't want to constrain Spliterator in this way because (for example) 
Doug might support concurrent modification in bulk ops on CHM.

On 11/19/2012 3:46 PM, Joe Bowbeer wrote:
> The spec for getSizeIfKnown() seems similar to that of
> InputStream.available(), which is spec'd so loosely that I don't think
> it should ever be used.  Similar to Thread.yield() in that respect.
>
>
>
> On Mon, Nov 19, 2012 at 12:37 PM, Doug Lea <dl at cs.oswego.edu
> <mailto:dl at cs.oswego.edu>> wrote:
>
>     On 11/16/12 11:40, Brian Goetz wrote:
>
>         Following up...we've eliminated StreamAccessor from the design,
>         so you only need
>         a Spliterator (and maybe stream flags) now to create a stream.
>           Here's the
>         latest API and spec for Spliterator.  Some explanations of the
>         choices are in
>         the original mail to which this is a reply.
>
>
>     My main concern is spec'ing/using the heuristic methods that
>     could do anything and return any answer vs provide required
>     functionality.
>
>     For example, getSizeIfKnown(). Is it required to promise that the
>     size will not change after return? Is it required to provide a result
>     if one is computationally possible rather than just handy?
>     Is it required to lie and return Integer.MAX_VALUE if it overflows int?
>     (Reminder: HashMaps and CHMs with > 1 << 31 elements are known to
>     exist out there.) These will be hard to spec exactly right.
>     And there are 4 more methods like this.
>
>     And I'm left wondering (as always) just how much good these
>     methods will do compared to an ultra-small/simple API,
>     considering that most collection developers would rather
>     put more time into writing custom versions of forEach, reduce,
>     etc rather than tuning their Spliterator hinting methods so that
>     the lambda-libs default versions work well?
>
>     Or, considering that each of these added methods seems
>     targeted at a particular data structure/shape (array,
>     tree, bounded list/seq), why not make subinterfaces for them?
>
>     interface Spliterator
>     interface SizedSpliterator extends Spliterator...
>     interface RandomAccessSpliterator extends SizedSpliterator ...
>     etc
>
>     On a little thought, I can't think of a reason not to do this?
>     Can you?
>
>     (Where to get this started, the minimal base version of
>     Spliterator has a split() method returning one with fewer
>     elements or null if it can't, and either is a or returns
>     an Iterator.)
>
>     -Doug
>
>


More information about the lambda-libs-spec-observers mailing list