Stream.generate redux
Joe Bowbeer
joe.bowbeer at gmail.com
Fri May 10 09:33:23 PDT 2013
Concerning the question regarding Long.MAX_VALUE values, I bet this is
enough for most use cases, but is that the right question -- when one is
dealing with an infinite stream?
I "might" be OK with limiting the number of values in the case of primitive
streams, because of the inherent constraints on primitive values, but I
would not want to limit the number of values generated for an infinite
non-primitive Stream. Btw, what would happen after Long.MAX_VALUES were
generated? The stream-driven computation would stop?
Joe
On Fri, May 10, 2013 at 8:53 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
> We have three choices for implementations for
> Int/Long/Double/Stream.generate. Note that the expectation is
> Stream.generate will be used in conjunction with a short-circuiting
> operation, such as limit, and generate serves as the implementation for
> generating streams of random numbers with Random.
>
> The 3 choices:
>
> 1) infinite, ordered, iterator-based, right-balanced tree
>
> This is the current state of affairs and tends to parallelize poorly due
> to it being iterator-based, which results in the production of
> right-balanced (unbalanced) trees.
>
> Infinite is potentially a desirable property, but infinite and
> right-balanced combined are a poor combination.
>
> Ordered is not intrinsically bad. It depends on what constraints are
> applied to invoking the supplier. If the constraint is the supplier may be
> called concurrently or in no particular order then the stream should not
> report any encounter order since non-deterministic results could be
> produced, contradicting that the stream has an encounter order.
>
> Consider this example:
>
> LongStream.range(0, Long.MAX_VALUE).parallel().map(i -> supplier.get());
>
> The stream input to the map has an encounter but the supplier is called in
> a temporal order, so the elements in the stream output from the map will be
> jumbled up in no particular order.
>
> It just so happens for 1) that the supplier is not called concurrently,
> this is a consequence of being an iterator-based supplier.
>
>
> 2) finite of known size, unordered, using LongStream.range(0,
> Long.MAX_VALUE).map(i -> s.get()).unordered()
>
> A maximum of Long.MAX_VALUE elements will be generated.
>
> We need to explicitly make the stream unordered, since map, reasonably,
> preserves order, assuming there is a correlation between the input and
> output.
>
> This implementation can now be combined with limit with out throwing
> OOMEs, since we recently optimized limit with sized streams in the lambda
> repo.
>
>
> 3) infinite, unordered, using infinite supplying spliterator
>
> This can also be combined with limit with out throwing OOMEs, pending
> changes to the limit implementation (I have a patch). Previously i thought
> it would be tricky but Brian and I have managed to optimize the limit case
> for unordered streams.
>
>
> I like the fact that 3) is now possible :-) there is a price to be paid
> though for limiting since it requires some buffering of elements and CASing
> on an AtomicLong (optimistically buffering reduces the number of CASes). 2)
> requires no buffering or use of concurrent data structures.
>
> So... i think it primarily comes down to:
>
> finite; or infinite with some extra cost
>
> ?
>
> Is Long.MAX_VALUE generated elements sufficient for most needs?
>
> FWIW perhaps an infinite source of generated values makes more sense if
> there were additional ways of short-circuiting such as cancellation (as we
> previously had and could add back) or if there are a limitWhile/takeWhile
> operations.
>
> Paul.
More information about the lambda-libs-spec-observers
mailing list