RFR: 8072727 - add variation of Stream.iterate() that's finite
Tagir F. Valeev
amaembo at gmail.com
Wed Feb 17 05:48:35 UTC 2016
Hello, Stuart!
Thank you for your comments.
SM> I'd suggest focusing on the API first before worrying about how to track the
SM> stream state with booleans, etc. Is the API convenient to use, and how well does
SM> it support the use cases we envision for it?
As Brian already noted, the most benefit of such signature is the
resemblance to the good old for loop. Also it's good, because the
lambdas don't need to maintain external mutable state in this case
(the state is encapsulated in the current element). Most of your
proposed examples, however, need to do it as they don't receive the
existing state. Also I see no reason to create a method which is
compatible with iterator::hasNext/iterator::next or even
spliterator::tryAdvance. If you already have a spliterator, you can
create a stream using StreamSupport.stream(spliterator, false). If you
have an iterator, you can convert it to spliterator using
Spliterators.spliterator[UnknownSize]. Well, this probably looks ugly,
but more flexible and signals that some low-level stuff is performed.
Supplier<Stream<T>> is definitely bad in terms of performance.
Creating a new stream is not nearly free. To illustrate this I wrote a
simple benchmark which compares .flatMapToInt(OptionalInt::stream)
with Java8 way .filter(OptionalInt::isPresent).mapToInt(OptionalInt::getAsInt)
Here's source code and full output:
http://cr.openjdk.java.net/~tvaleev/jmh/optionalstream/
Benchmark (n) Mode Cnt Score Error Units
OptionalTest.testOptionalFilterGet 10 avgt 30 0,171 ± 0,011 us/op
OptionalTest.testOptionalFilterGet 1000 avgt 30 6,295 ± 0,046 us/op
OptionalTest.testOptionalFilterGet 1000000 avgt 30 12597,706 ± 69,214 us/op
OptionalTest.testOptionalStream 10 avgt 30 0,330 ± 0,002 us/op
OptionalTest.testOptionalStream 1000 avgt 30 27,552 ± 0,577 us/op
OptionalTest.testOptionalStream 1000000 avgt 30 30837,240 ± 812,420 us/op
Involving intermediate streams makes the thing at least twice slower.
Surely this delay could become negligible in some scenarios, but I
think it's inacceptable to enforce users to create new source with a
bunch of streams. At least primitive specializations will become
meaningless in this case: boxing would eat much less time compared to
stream creation.
As for elements drawn from the queue, it's much better to use existing
takeWhile method:
queue.stream().takeWhile(x -> x.equals(sentinel));
True, such approach will not include the sentinel element to the
result, and there's no easy way to do it with current API. Probably
additional method (takeWhileInclusive?) could be considered to solve
such problems. Still, I think, drawing from the queue is not the
problem which should be solved with new iterate() method.
As for Collatz conjecture, it's quite easy to iterate without trailing
one:
IntStream.iterate(start, val -> val != 1,
val -> val % 2 == 0 ? val / 2 : val * 3 + 1)
If having one is desired, then it would be easier just to append one
to the stream (even if Collatz conjecture is false we will have an
infinite stream, so appended one will never appear):
IntStream.concat(
IntStream.iterate(start, val -> val != 1,
val -> val % 2 == 0 ? val / 2 : val * 3 + 1),
IntStream.of(1))
A side note: having IntStream.append(int... numbers) would be really
nice:
IntStream.iterate(start, val -> val != 1,
val -> val % 2 == 0 ? val / 2 : val * 3 + 1).append(1)
Another approach would be to introduce a special stop value (for
example, -1):
IntStream.iterate(start, val -> val != -1,
val -> val == 1 ? -1 : val % 2 == 0 ? val / 2 : val * 3 + 1)
This stream produces Collatz series, including the trailing one.
As for Craps, I never heard about such game. If I understood the rules
correctly, it's good to represent the state as separate object and
define state transition via its method. Something like this should
work:
Random r = new Random();
IntSupplier dice = () -> r.nextInt(6)+r.nextInt(6)+2;
class State {
int roll, point;
State(int roll, int point) {
this.roll = roll;
this.point = point;
}
State() {
this(dice.getAsInt(), 0);
}
boolean isStopRound() {
return roll == 7 || (point == 0 && (roll > 10 || roll < 4)) || (point != 0 && roll == point);
}
State next() {
return isStopRound() ? null : new State(dice.getAsInt(), point == 0 ? roll : point);
}
}
Stream.iterate(new State(), Objects::nonNull, State::next)
.mapToInt(state -> state.roll)
.forEach(System.out::println);
With best regards,
Tagir Valeev.
SM> In particular, I can imagine a number of cases where it would be very helpful to
SM> be able to support an empty stream, or where the computation to produce the
SM> first element is the same as the computation to produce subsequent elements.
SM> Requiring a value for the first stream element is at odds with that.
SM> Here are some ideas for use cases to try out:
SM> - a series of dice rolls representing a round of craps [1]
SM> - elements drawn from a queue until the queue is empty or until
SM> a sentinel is reached
SM> - a sequence of numbers that (probably) terminates but whose length
SM> isn't necessarily known in advance (e.g. Collatz sequence [2])
SM> [1] https://en.wikipedia.org/wiki/Craps
SM> [2] https://en.wikipedia.org/wiki/Collatz_conjecture
SM> Note that in some cases the sentinel value that terminates the stream should be
SM> part of the stream, and in other cases it's not.
SM> I'm sure you can find more uses cases by perusing Stack Overflow. :-)
SM> I'm a bit skeptical of the use of "iterate" for producing a finite stream. There
SM> are the usual issues with overloading, but there's also potential confusion as
SM> some forms of iterate() are infinite and others finite. I'll suggest the name
SM> "produce" instead, but there are surely better terms.
SM> One thing to think about is where the state of the producer is stored. Is it
SM> expected to be in an argument that's passed to each invocation of the functional
SM> argument, or is it expected to be captured? I don't think there's an answer in
SM> isolation; examining use cases would probably shed some light here.
SM> Here are a few API ideas (wildcards elided):
SM> --
SM> <T> Stream<T> iterate(T seed, Predicate<T> predicate, UnaryOperator<T> f)
SM> The API from your proposal, for comparison purposes.
SM> --
SM> <T> Stream<T> produce(Supplier<Optional<T>>)
SM> Produces elements until empty Optional is returned. This box/unboxes every
SM> element, maybe(?) alleviated by Valhalla.
SM> --
SM> <T> Stream<T> produce(BooleanSupplier, Supplier<T>)
SM> Calls the BooleanSupplier; if true the next stream element is what's returned by
SM> calling the Supplier. If BooleanSupplier returns false, end of stream. If you
SM> have an iterator already, this enables
SM> produce(iterator::hasNext, iterator::next)
SM> But if you don't have an iterator already, coming up with the functions to
SM> satisfy the iterator-style protocol is sometimes painful.
SM> --
SM> <T> Stream<T> produce(Predicate<Consumer<T>> advancer)
SM> This has an odd signature, but the function is like Spliterator.tryAdvance(). It
SM> must either call the consumer once and return true, or return false without
SM> calling the consumer.
SM> --
SM> <T> Stream<T> produce(Consumer<Consumer<T>> advancer)
SM> A variation of the above, without a boolean return. The advancer calls the
SM> consumer one or more times to add elements to the stream. End of stream occurs
SM> when the advancer doesn't call the consumer.
SM> --
SM> <T> Stream<T> produce(Supplier<Stream<T>>)
SM> A variation of Supplier<Optional<T>> where the supplier returns a stream
SM> containing zero or more elements. The stream terminates if the supplier returns
SM> an empty stream. There "boxing" overhead here, but we don't seem to be bothered
SM> by this with flatMap().
SM> --
SM> s'marks
SM> On 2/14/16 6:53 AM, Tagir F. Valeev wrote:
>> Hello!
>>
>> I wanted to work on foldLeft, but Brian asked me to take this issue
>> instead. So here's webrev:
>> http://cr.openjdk.java.net/~tvaleev/webrev/8072727/r1/
>>
>> I don't like iterator-based Stream source implementations, so I made
>> them AbstractSpliterator-based. I also implemented manually
>> forEachRemaining as, I believe, this improves the performance in
>> non-short-circuiting cases.
>>
>> I also decided to keep two flags (started and finished) to track the
>> state. Currently existing implementation of infinite iterate() does
>> not use started flag, but instead reads one element ahead for
>> primitive streams. This seems wrong to me and may even lead to
>> unexpected exceptions (*). I could get rid of "started" flag for
>> Stream.iterate() using Streams.NONE, but this would make object
>> implementation different from primitive implementations. It would also
>> be possible to keep single three-state variable (byte or int,
>> NOT_STARTED, STARTED, FINISHED), but I doubt that this would improve
>> the performance or footprint. Having two flags looks more readable to
>> me.
>>
>> Currently existing two-arg iterate methods can now be expressed as a
>> partial case of the new method:
>>
>> public static<T> Stream<T> iterate(final T seed, final UnaryOperator<T> f) {
>> return iterate(seed, x -> true, f);
>> }
>> (same for primitive streams). I may do this if you think it's
>> reasonable.
>>
>> I created new test class and added new iterate sources to existing
>> data providers.
>>
>> Please review and sponsor!
>>
>> With best regards,
>> Tagir Valeev.
>>
>> (*) Consider the following code:
>>
>> int[] data = {1,2,3,4,-1};
>> IntStream.iterate(0, x -> data[x])
>> .takeWhile(x -> x >= 0)
>> .forEach(System.out::println);
>>
>> Currently this unexpectedly throws an AIOOBE, because
>> IntStream.iterate unnecessarily tries to read one element ahead.
>>
More information about the core-libs-dev
mailing list