RFR: 8072727 - add variation of Stream.iterate() that's finite

Mon Feb 15 01:03:15 UTC 2016

Hi Tagir,

Thanks for picking this up.

I'll be at a conference this week and I won't have much time to discuss this in 
detail until afterward. But here are some quick thoughts about the proposal.

I'd suggest focusing on the API first before worrying about how to track the 
stream state with booleans, etc. Is the API convenient to use, and how well does 
it support the use cases we envision for it?

In particular, I can imagine a number of cases where it would be very helpful to 
be able to support an empty stream, or where the computation to produce the 
first element is the same as the computation to produce subsequent elements. 
Requiring a value for the first stream element is at odds with that.

Here are some ideas for use cases to try out:

  - a series of dice rolls representing a round of craps [1]
  - elements drawn from a queue until the queue is empty or until
    a sentinel is reached
  - a sequence of numbers that (probably) terminates but whose length
    isn't necessarily known in advance (e.g. Collatz sequence [2])

[1] https://en.wikipedia.org/wiki/Craps

[2] https://en.wikipedia.org/wiki/Collatz_conjecture

Note that in some cases the sentinel value that terminates the stream should be 
part of the stream, and in other cases it's not.

I'm sure you can find more uses cases by perusing Stack Overflow. :-)

I'm a bit skeptical of the use of "iterate" for producing a finite stream. There 
are the usual issues with overloading, but there's also potential confusion as 
some forms of iterate() are infinite and others finite. I'll suggest the name 
"produce" instead, but there are surely better terms.

One thing to think about is where the state of the producer is stored. Is it 
expected to be in an argument that's passed to each invocation of the functional 
argument, or is it expected to be captured? I don't think there's an answer in 
isolation; examining use cases would probably shed some light here.

Here are a few API ideas (wildcards elided):

--

<T> Stream<T> iterate(T seed, Predicate<T> predicate, UnaryOperator<T> f)

The API from your proposal, for comparison purposes.

--

<T> Stream<T> produce(Supplier<Optional<T>>)

Produces elements until empty Optional is returned. This box/unboxes every 
element, maybe(?) alleviated by Valhalla.

--

<T> Stream<T> produce(BooleanSupplier, Supplier<T>)

Calls the BooleanSupplier; if true the next stream element is what's returned by 
calling the Supplier. If BooleanSupplier returns false, end of stream. If you 
have an iterator already, this enables

produce(iterator::hasNext, iterator::next)

But if you don't have an iterator already, coming up with the functions to 
satisfy the iterator-style protocol is sometimes painful.

--

<T> Stream<T> produce(Predicate<Consumer<T>> advancer)

This has an odd signature, but the function is like Spliterator.tryAdvance(). It 
must either call the consumer once and return true, or return false without 
calling the consumer.

--

<T> Stream<T> produce(Consumer<Consumer<T>> advancer)

A variation of the above, without a boolean return. The advancer calls the 
consumer one or more times to add elements to the stream. End of stream occurs 
when the advancer doesn't call the consumer.

--

<T> Stream<T> produce(Supplier<Stream<T>>)

A variation of Supplier<Optional<T>> where the supplier returns a stream 
containing zero or more elements. The stream terminates if the supplier returns 
an empty stream. There "boxing" overhead here, but we don't seem to be bothered 
by this with flatMap().

--

s'marks

On 2/14/16 6:53 AM, Tagir F. Valeev wrote:
> Hello!
>
> I wanted to work on foldLeft, but Brian asked me to take this issue
> instead. So here's webrev:
> http://cr.openjdk.java.net/~tvaleev/webrev/8072727/r1/
>
> I don't like iterator-based Stream source implementations, so I made
> them AbstractSpliterator-based. I also implemented manually
> forEachRemaining as, I believe, this improves the performance in
> non-short-circuiting cases.
>
> I also decided to keep two flags (started and finished) to track the
> state. Currently existing implementation of infinite iterate() does
> not use started flag, but instead reads one element ahead for
> primitive streams. This seems wrong to me and may even lead to
> unexpected exceptions (*). I could get rid of "started" flag for
> Stream.iterate() using Streams.NONE, but this would make object
> implementation different from primitive implementations. It would also
> be possible to keep single three-state variable (byte or int,
> NOT_STARTED, STARTED, FINISHED), but I doubt that this would improve
> the performance or footprint. Having two flags looks more readable to
> me.
>
> Currently existing two-arg iterate methods can now be expressed as a
> partial case of the new method:
>
> public static<T> Stream<T> iterate(final T seed, final UnaryOperator<T> f) {
>      return iterate(seed, x -> true, f);
> }
> (same for primitive streams). I may do this if you think it's
> reasonable.
>
> I created new test class and added new iterate sources to existing
> data providers.
>
> Please review and sponsor!
>
> With best regards,
> Tagir Valeev.
>
> (*) Consider the following code:
>
> int[] data = {1,2,3,4,-1};
> IntStream.iterate(0, x -> data[x])
>           .takeWhile(x -> x >= 0)
>           .forEach(System.out::println);
>
> Currently this unexpectedly throws an AIOOBE, because
> IntStream.iterate unnecessarily tries to read one element ahead.
>