Streams.generate: infinite or finite?
Paul Sandoz
paul.sandoz at oracle.com
Tue Apr 16 01:13:12 PDT 2013
On Apr 12, 2013, at 11:18 PM, Jim Mayer <jim at pentastich.org> wrote:
> Perhaps I'm missing something, but what makes it hard to do a
> high performance parallel split of an infinite stream? Sure, you can't
> break it up into even chunks, but is that really all that different than
> breaking up Long.MAX_VALUE values? On an 8 core machine you're not going
> to break them up into 2^60 element chunks either, right?
I think comes down to resource management. There is no ideal solution but some solutions might be more efficient than others.
Splitting from an iterator produces a right-balanced tree of a depth that increases as one keeps splitting. Each left-split is created by copying elements into an array. The current implementation will produce a right-balanced tree of depth 128 for 2^32 elements.
With Streams.longRange(0, Long.MAX_VALUE) the maximum depth will be 63, the tree is balanced, and no buffering is required when splitting. But to work effectively we require a wrapping spliterator that performs the limit operation and discards underlying splits that are not in the required range. We probably need to bias splits for such large ranges say 1:7 or 1:15 so that is quicker to get to the left but without creating trees of unduly large depth [*]. The assumption being the range to limit will often be 0 to something much much less than 2^63 - 1.
So while the latter is not perfect i think it can avoid some costs of the former.
Brian suggested an idea of:
nulls()
that just keeps producing nulls, thus generate would be nulls().map(n -> s.get()).
I think it possible to for nulls() to use Streams.longRange(0, Long.MAX_VALUE) to get a truly infinite stream. i.e. the left split of nulls() does the equivalent of Streams.longRange(0, Long.MAX_VALUE) and the right split is of unknown size. However, practically i wonder if this is really worth it?
Paul.
[*] It is not possible to hint to a Spliterator.trySplit to bias the splitting.
2 4 8 16 32 64 128 256 1024
>
> Yours confusedly,
>
> Jim Mayer
>
>
> On Fri, Apr 12, 2013 at 10:50 AM, Paul Sandoz <paul.sandoz at oracle.com>wrote:
>
>> Hi,
>>
>> Currently Streams.generate produces an infinite stream. This is
>> theoretically nice but splits poorly (right-balanced trees).
>>
>> Implementation-wise Streams.generate creates a spliterator from an
>> iterator:
>>
>> public static<T> Stream<T> generate(Supplier<T> s) {
>> Objects.requireNonNull(s);
>> InfiniteIterator<T> iterator = s::get;
>> return StreamSupport.stream(Spliterators.spliteratorUnknownSize(
>> iterator,
>> Spliterator.ORDERED | Spliterator.IMMUTABLE));
>> }
>>
>> The method is used in java.util.Random:
>>
>> public IntStream ints() {
>> return Streams.generateInt(this::nextInt);
>> }
>>
>> There might be a nasty surprise in store for developers that expect the
>> randomly generated stream of int values to have the best parallel
>> performance.
>>
>>
>> We can change Streams.generate to be finite (or not know to be finite in
>> the time allotted to do some computation) by implementing as follows:
>>
>> public static<T> Stream<T> generate(Supplier<T> s) {
>> return Streams.longRange(0, Long.MAX_VALUE).mapToObj(i -> s.get());
>> }
>>
>> This will yield better parallel performance because the splits are
>> balanced.
>>
>> We can further change to:
>>
>> public static<T> Stream<T> generate(Supplier<T> s) {
>> return Streams.longs().mapToObj(i -> s.get());
>> }
>>
>> if we introduce the longs() idiom.
>>
>>
>> I think we should go finite! and add Streams.longs(). Agree? or disagree?
>>
>> Then it is actually questionable if Streams.generate should exist at all.
>> It does have some pedagogic value since the idiom Streams.longs().map() may
>> not be obvious. So i would be mostly inclined to keep it for that reason.
>>
>> Paul.
More information about the lambda-libs-spec-observers
mailing list