Combining streams

Fri Jul 26 15:15:42 PDT 2013

On Fri, Jul 26, 2013 at 8:29 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
> Hi Michael,
>
> These are good questions, what we are currently lacking is a description of the Stream API to help build up a mental model, however i am not sure JavaDoc is the best place for that.
>
>
> On Jul 26, 2013, at 3:52 AM, Michael Hixson <michael.hixson at gmail.com> wrote:
>
>> I have a couple of questions about combining streams.  Setting aside
>> the question of which code is easier to read, I am wondering about
>> things like performance or other hidden gotchas.
>>
>> 1. To combine two streams, I should do this:
>>
>> Stream.concat(s1, s2)
>>
>> not this:
>>
>> Stream.of(s1, s2).flatMap(x -> x)
>>
>> ... right?
>>
>
> The former will preserve some properties of both streams if it can, specifically if both streams are sized the output is sized and certain optimizations can kick in, where as flatMap will clear any size properties and thus such optimizations are no longer applicable.
>
> The flatMap operation is also less efficient since it requires more work per element.
>
>
>> 2. To combine more than two streams, I should do this:
>>
>> Stream.of(s1, s2, s3, ...).flatMap(x -> x)
>>
>> not this:
>>
>> Stream.of(s1, s2, s3, ...).reduce(Stream.empty(), Stream::concat)
>>
>> ... right?
>>
>
> This one might be somewhat trickier to evaluate in terms of performance. I find the former more elegant, but as Brian points out it will not work for infinite streams, since the implementation does s.forEach(...)
>
> Sequentially the latter is equivalent to:
>
>    Stream.concat(Stream.concat(Stream.concat(Stream.empty(), s1), s2), s3)
>
> So there can be lots of wrapping going on. For parallel evaluation the wrapping is likely to be more prevalent, since it will mirror that of the computation tree.

Ok.  I thought flatMap was the way to go on #2 because it seemed
more... "direct".  It also appears to be faster, but if it has worse
semantics then it's not worth it.

I have a vague, maybe baseless notion that the reduce+concat code
produces a stream that looks like a perfectly unbalanced tree.  Like
this:

 /  \
s1 /  \
  s2 /  \
    s3 /  \
      ...

Also in my head, I imagine the flatMap code producing a flat tree like this:

 /  |  |  |
s1  s2 s3 ...

And I imagine that second shape is better than the first, for some
reason.  Are my suspicions valid?  If I were to write my own varargs
concat method, would it be worth trying to make a balanced tree?

This seems to perform better than reduce+concat, and not quite as well
as flatMap.

  public static <T> Stream<T> concat(Stream<T>... streams) {
    Objects.requireNonNull(streams);
    return concat2(streams, 0, streams.length);
  }

  private static <T> Stream<T> concat2(Stream<T>[] streams,
                                       int fromIndex,
                                       int toIndex) {
    int width = toIndex - fromIndex;
    switch (width) {
      case 0:
        return Stream.empty();
      case 1:
        return streams[fromIndex];
      case 2:
        return Stream.concat(streams[fromIndex],
                             streams[fromIndex + 1]);
      default:
        int split = fromIndex + width / 2;
        if (width % 2 == 1) split++;
        return Stream.concat(concat2(streams, fromIndex, split),
                             concat2(streams, split, toIndex));
    }
  }

(Probably writing my own spliterator like Streams.ConcatSpliterator
would be a better solution, but that looks really complicated.)

-Michael

>
> IIRC the EG may have touched upon a var-arg concat, but i suspect it fell through the gaps.
>
> Paul.
>
>