Into

Brian Goetz brian.goetz at oracle.com
Fri Dec 21 09:50:57 PST 2012


I'm starting to dislike "into".

First, it's the only stream method which retains mutable state from the 
user.  That's not great.

Second, the parallel story is bad.  People are going to write

   list.parallel(e -> e+1).into(new ArrayList<>());

which will do a whole lot of trivial computation in parallel, wait on 
the barrier implicit in sequential(), and then do an O(n) serial thing.

Third, the semantics are weird; we do this clever trick where 
collections have to decide whether to do insertion in serial or 
parallel.  But as we all learned from Spinal Tap, there's a fine line 
between clever and stupid.

Instead, we could treat this like a mutable reduce, where leaves are 
reduced to a List, and lists are merged as we go up the tree.  Even with 
dumb merging is still going to be much faster than what we've got now; 
no barrier, no buffer the whole thing and copy, and the worst serial 
step is O(n/2) instead of O(n).  So probably 3x better just by improving 
the serial fractions.  But with a smarter combination step, we can do 
better still.  If we have a "concatenated list view" operation (List 
concat(List a, List b)), which returns a read-only, conc-tree 
representation), then the big serial stage goes away.

And, of course, building atop reduce makes the whole thing simpler; 
there are fewer ops that have their own distinct semantics, and the 
semantics of into() is about as weird as you get.


Now that the tabulators framework gets users comfortable with the 
explicit choice between functional and concurrent aggregation for 
tabulation, it is a much shorter hop to get there.  So let's build on 
that and find some sort of way to surface mutable and concurrent 
versions of "into".  (Currently we have no good concurrent list-shaped 
collections, but hopefully that changes.)

Something like:

   Stream.tabulate(collector(ArrayList::new))
   Stream.tabulate(concurrentCollector(ConcurrentFooList::new))

Maybe with some rename of tabulate.

I think there's a small reorganization of naming lurking here (involving 
tabulate, Tabulator, ConcurrentTabulator, MutableReducer, reduce) that 
recasts into() either as an explicit functional or concurrent 
tabulation.  And one more tricky+slow special-purpose op bites the dust, 
in favor of something that builds on our two favorite primitives, fold 
(order-preserving) and forEach (not order-preserving.)




More information about the lambda-libs-spec-observers mailing list