Stream constructors for stream(Iterator) in StreamSupport?

Brian Goetz brian.goetz at oracle.com
Sat Apr 13 12:25:54 PDT 2013


Good question.  Here's my reasoning about why I thought it lives better 
in SS than S; let me know if you find this argument compelling.  (Also, 
this speaks to an area currently missing in the docs.)

There are lots of ways to make a stream, and some are better than 
others.  The absolute worst is via an Iterator.

Best way is to get one from your data source directly (e.g., 
ArrayList.stream()).  The streams provided by collections and other JDK 
classes have highly optimized spliterators (thanks Doug!), work directly 
with knowledge of the data structure, are late-binding to minimize 
CME-like interference, and preserve the most information (such as 
sorted-ness, sized-ness, distinct-ness) that the streams framework can 
use directly to optimize execution.

The next best way is via one of the factories in Streams -- things like 
intRange, iterate, generate.  These are mire flexible than they first 
appear; for example, if you have a function int -> T, and you want to 
generate a sequence of f(0), f(1), ... f(n) in a parallel-friendly way, 
you can just do:

   intRange(0, n).map(f);

The next best way is via a Spliterator that properly declares its 
properties, is SIZED, SUBSIZED, and has a good trySplit implementation. 
  These will ensure that things decompose well.  Many of the JDK 
spliterators have these characteristics.

We then slide down the scale of spliterator quality; SUBSIZED is 
probably the first to go, then SIZED, then trySplit.  As the spliterator 
quality degrades, the quality of decomposition and opportunity for 
pipeline optimization degrades too.

We then come to the bottom of the barrel, iterators.  Making a 
Spliterator from an iterator sucks in at least the following ways:
  - Splitting will suck.  We can still extract some parallelism for 
high-Q problems, but it will never be good, placing a lid on how much 
parallelism you can get.
  - Iterators throw away a lot of useful information about the 
underlying data source, such as its size.  It may be that whoever wrote 
the Iterator knows the size, but the Iterator does not.  (We've got an 
iterator+size to spliterator conversion, but that's brittle because of 
"early binding" to the size information.)
  - Element access overhead.  One of the reasons for doing Spliterator 
is that Iterator sucks so badly!  (High per-element cost; two method 
calls per element, often with redundant computation due to required 
defensive coding; Iterator protocol often requires lookahead and 
buffering; inherent race between hasNext() and next().)  So you're 
taking a sucky way to get elements out of a source, and wrapping it with 
more junk.


So, while Iterator to Stream is still a fine last resort, putting it in 
Streams will likely have the unfortunate effect of guiding users to the 
worst way of making a stream, without fully understanding the tradeoffs.



On 4/13/2013 12:06 PM, Tim Peierls wrote:
> Doesn't that seem like something that belongs in Streams? If you're
> stuck with a legacy API that exposes Iterator but not Iterable, you'd
> still want to be able to make a Stream out of it, and you wouldn't want
> to have to look in StreamSupport for that. It's a lot different from
> stream(Spliterator).
>
> On Sat, Apr 13, 2013 at 11:24 AM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>     Currently StreamSupport contains seq/par versions of
>        stream(Spliterator)
>        stream(Supplier<Spliterator>)
>     for ref/int/long/double.
>
>     In java.util.Spliterators, there are adapters to turn an Iterator
>     into a Spliterator.
>
>     I think we should add convenience factories for
>
>        stream(Iterator)
>
>     to StreamSupport as well.
>
>


More information about the lambda-libs-spec-observers mailing list