flatMap performance <was> Re: Additional method on Stream

Tue Apr 28 12:30:04 UTC 2015

On Apr 28, 2015, at 12:57 PM, Paul Sandoz <Paul.Sandoz at oracle.com> wrote:

> Hi Peter,
> 
> You are correct in stating that flatMap has some overhead. 
> 
> There are optimizations in place for operating on one element and on the head of the stream that reduce the overhead.

I believe at least in the micro-benchmark cases i can reduce the flatMap time by about 60% with special stream impls for none and one element, rather than levering the stream builder directly. They are not that difficult if one uses a pattern like the following: 

static final class StreamOfOne<T> implements Stream<T> {
    boolean consumed;
    final T t;

    public StreamOfOne(T t) {
        this.t = t;
    }

    void consumed() {
        if (consumed) throw new IllegalStateException();
        consumed = true;
    }

    Stream<T> fork() {
        consumed();
        return StreamSupport.stream(new Streams.StreamBuilderImpl<>(t), false);
    }

...

    @Override
    public Stream<T> filter(Predicate<? super T> predicate) {
        return fork().filter(predicate);
    }

...

    @Override
    public void forEach(Consumer<? super T> action) {
        consumed();
        action.accept(t);
    }

...
}

There is a cost if intermediate operations and certain terminal operations (like collect) are invoked (that cost can be reduced by merging in a Spliterator implementation). For the common cases of simpler terminal operations, with flatMap, it's a win. So I am pondering adding such implementations. 

Also, FWIW, returning null rather than Stream.empty() is slightly faster. The latter results in more profiling effects.

Looking at generated code there is still a lot of "ceremony" that one would think hotspot would just able to do away with given the temporary Stream objects (me of course not understanding the intricate details of the C2 compiler).

Paul.