Stream constructors for stream(Iterator) in StreamSupport?
Brian Goetz
brian.goetz at oracle.com
Sat Apr 13 12:25:54 PDT 2013
Good question. Here's my reasoning about why I thought it lives better
in SS than S; let me know if you find this argument compelling. (Also,
this speaks to an area currently missing in the docs.)
There are lots of ways to make a stream, and some are better than
others. The absolute worst is via an Iterator.
Best way is to get one from your data source directly (e.g.,
ArrayList.stream()). The streams provided by collections and other JDK
classes have highly optimized spliterators (thanks Doug!), work directly
with knowledge of the data structure, are late-binding to minimize
CME-like interference, and preserve the most information (such as
sorted-ness, sized-ness, distinct-ness) that the streams framework can
use directly to optimize execution.
The next best way is via one of the factories in Streams -- things like
intRange, iterate, generate. These are mire flexible than they first
appear; for example, if you have a function int -> T, and you want to
generate a sequence of f(0), f(1), ... f(n) in a parallel-friendly way,
you can just do:
intRange(0, n).map(f);
The next best way is via a Spliterator that properly declares its
properties, is SIZED, SUBSIZED, and has a good trySplit implementation.
These will ensure that things decompose well. Many of the JDK
spliterators have these characteristics.
We then slide down the scale of spliterator quality; SUBSIZED is
probably the first to go, then SIZED, then trySplit. As the spliterator
quality degrades, the quality of decomposition and opportunity for
pipeline optimization degrades too.
We then come to the bottom of the barrel, iterators. Making a
Spliterator from an iterator sucks in at least the following ways:
- Splitting will suck. We can still extract some parallelism for
high-Q problems, but it will never be good, placing a lid on how much
parallelism you can get.
- Iterators throw away a lot of useful information about the
underlying data source, such as its size. It may be that whoever wrote
the Iterator knows the size, but the Iterator does not. (We've got an
iterator+size to spliterator conversion, but that's brittle because of
"early binding" to the size information.)
- Element access overhead. One of the reasons for doing Spliterator
is that Iterator sucks so badly! (High per-element cost; two method
calls per element, often with redundant computation due to required
defensive coding; Iterator protocol often requires lookahead and
buffering; inherent race between hasNext() and next().) So you're
taking a sucky way to get elements out of a source, and wrapping it with
more junk.
So, while Iterator to Stream is still a fine last resort, putting it in
Streams will likely have the unfortunate effect of guiding users to the
worst way of making a stream, without fully understanding the tradeoffs.
On 4/13/2013 12:06 PM, Tim Peierls wrote:
> Doesn't that seem like something that belongs in Streams? If you're
> stuck with a legacy API that exposes Iterator but not Iterable, you'd
> still want to be able to make a Stream out of it, and you wouldn't want
> to have to look in StreamSupport for that. It's a lot different from
> stream(Spliterator).
>
> On Sat, Apr 13, 2013 at 11:24 AM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
> Currently StreamSupport contains seq/par versions of
> stream(Spliterator)
> stream(Supplier<Spliterator>)
> for ref/int/long/double.
>
> In java.util.Spliterators, there are adapters to turn an Iterator
> into a Spliterator.
>
> I think we should add convenience factories for
>
> stream(Iterator)
>
> to StreamSupport as well.
>
>
More information about the lambda-libs-spec-observers
mailing list