Stream.generate redux

Fri May 10 08:53:45 PDT 2013

We have three choices for implementations for Int/Long/Double/Stream.generate. Note that the expectation is Stream.generate will be used in conjunction with a short-circuiting operation, such as limit, and generate serves as the implementation for generating streams of random numbers with Random.

The 3 choices:

1) infinite, ordered, iterator-based, right-balanced tree

This is the current state of affairs and tends to parallelize poorly due to it being iterator-based, which results in the production of right-balanced (unbalanced) trees. 

Infinite is potentially a desirable property, but infinite and right-balanced combined are a poor combination.

Ordered is not intrinsically bad. It depends on what constraints are applied to invoking the supplier. If the constraint is the supplier may be called concurrently or in no particular order then the stream should not report any encounter order since non-deterministic results could be produced, contradicting that the stream has an encounter order.

Consider this example:

  LongStream.range(0, Long.MAX_VALUE).parallel().map(i -> supplier.get());

The stream input to the map has an encounter but the supplier is called in a temporal order, so the elements in the stream output from the map will be jumbled up in no particular order.

It just so happens for 1) that the supplier is not called concurrently, this is a consequence of being an iterator-based supplier.

2) finite of known size, unordered, using LongStream.range(0, Long.MAX_VALUE).map(i -> s.get()).unordered()

A maximum of Long.MAX_VALUE elements will be generated.

We need to explicitly make the stream unordered, since map, reasonably, preserves order, assuming there is a correlation between the input and output.

This implementation can now be combined with limit with out throwing OOMEs, since we recently optimized limit with sized streams in the lambda repo.

3) infinite, unordered, using infinite supplying spliterator

This can also be combined with limit with out throwing OOMEs, pending changes to the limit implementation (I have a patch).  Previously i thought it would be tricky but Brian and I have managed to optimize the limit case for unordered streams.

I like the fact that 3) is now possible :-) there is a price to be paid though for limiting since it requires some buffering of elements and CASing on an AtomicLong (optimistically buffering reduces the number of CASes). 2) requires no buffering or use of concurrent data structures.

So... i think it primarily comes down to: 

  finite; or infinite with some extra cost

?

Is Long.MAX_VALUE generated elements sufficient for most needs?

FWIW perhaps an infinite source of generated values makes more sense if there were additional ways of short-circuiting such as cancellation (as we previously had and could add back) or if there are a limitWhile/takeWhile operations.

Paul.