Streams.generate: infinite or finite?

Paul Sandoz paul.sandoz at oracle.com
Tue Apr 16 01:13:12 PDT 2013


On Apr 12, 2013, at 11:18 PM, Jim Mayer <jim at pentastich.org> wrote:

> Perhaps I'm missing something, but what makes it hard to do a
> high performance parallel split of an infinite stream?  Sure, you can't
> break it up into even chunks, but is that really all that different than
> breaking up Long.MAX_VALUE values?  On an 8 core machine you're not going
> to break them up into 2^60 element chunks either, right?

I think comes down to resource management. There is no ideal solution but some solutions might be more efficient than others.

Splitting from an iterator produces a right-balanced tree of a depth that increases as one keeps splitting. Each left-split is created by copying elements into an array. The current implementation will produce a right-balanced tree of depth 128 for 2^32 elements.

With Streams.longRange(0, Long.MAX_VALUE) the maximum depth will be 63, the tree is balanced, and no buffering is required when splitting. But to work effectively we require a wrapping spliterator that performs the limit operation and discards underlying splits that are not in the required range. We probably need to bias splits for such large ranges say 1:7 or 1:15 so that is quicker to get to the left but without creating trees of unduly large depth [*]. The assumption being the range to limit will often be 0 to something much much less than 2^63 - 1.

So while the latter is not perfect i think it can avoid some costs of the former.

Brian suggested an idea of:

 nulls()

that just keeps producing nulls, thus generate would be nulls().map(n -> s.get()).

I think it possible to for nulls() to use Streams.longRange(0, Long.MAX_VALUE) to get a truly infinite stream. i.e. the left split of nulls() does the equivalent of Streams.longRange(0, Long.MAX_VALUE) and the right split is of unknown size. However, practically i wonder if this is really worth it?

Paul.

[*] It is not possible to hint to a Spliterator.trySplit to bias the splitting.


2 4 8 16 32 64 128 256 1024 
> 
> Yours confusedly,
> 
> Jim Mayer
> 
> 
> On Fri, Apr 12, 2013 at 10:50 AM, Paul Sandoz <paul.sandoz at oracle.com>wrote:
> 
>> Hi,
>> 
>> Currently Streams.generate produces an infinite stream. This is
>> theoretically nice but splits poorly (right-balanced trees).
>> 
>> Implementation-wise Streams.generate creates a spliterator from an
>> iterator:
>> 
>>    public static<T> Stream<T> generate(Supplier<T> s) {
>>        Objects.requireNonNull(s);
>>        InfiniteIterator<T> iterator = s::get;
>>        return StreamSupport.stream(Spliterators.spliteratorUnknownSize(
>>                iterator,
>>                Spliterator.ORDERED | Spliterator.IMMUTABLE));
>>    }
>> 
>> The method is used in java.util.Random:
>> 
>>    public IntStream ints() {
>>        return Streams.generateInt(this::nextInt);
>>    }
>> 
>> There might be a nasty surprise in store for developers that expect the
>> randomly generated stream of int values to have the best parallel
>> performance.
>> 
>> 
>> We can change Streams.generate to be finite (or not know to be finite in
>> the time allotted to do some computation) by implementing as follows:
>> 
>>    public static<T> Stream<T> generate(Supplier<T> s) {
>>      return Streams.longRange(0, Long.MAX_VALUE).mapToObj(i -> s.get());
>>    }
>> 
>> This will yield better parallel performance because the splits are
>> balanced.
>> 
>> We can further change to:
>> 
>>    public static<T> Stream<T> generate(Supplier<T> s) {
>>      return Streams.longs().mapToObj(i -> s.get());
>>    }
>> 
>> if we introduce the longs() idiom.
>> 
>> 
>> I think we should go finite! and add Streams.longs().  Agree? or disagree?
>> 
>> Then it is actually questionable if Streams.generate should exist at all.
>> It does have some pedagogic value since the idiom Streams.longs().map() may
>> not be obvious. So i would be mostly inclined to keep it for that reason.
>> 
>> Paul.



More information about the lambda-libs-spec-observers mailing list