EmptyStream to boost performance

Dr Heinz M. Kabutz heinz at javaspecialists.eu
Sat Nov 13 13:57:57 UTC 2021


Hello again,

after some excellent feedback, I have changed the EmptyStream 
implementation to contain state. This means we don't get the object 
allocation down to zero, but it is very close thanks to escape analysis. 
The speedup is impressive. For an empty ArrayList, we get the following 
improvements:

minimal - 2.13x faster:
stream.max(Integer::compare)

basic - 2.47x faster:
stream.filter(Objects::nonNull)
            .map(Function.identity())
            .max(Integer::compare),

complex - 4.75x faster:
stream.filter(Objects::nonNull)
            .map(Function.identity())
            .filter(Objects::nonNull)
            .sorted()
            .distinct()
            .max(Integer::compare)

crossover - 9.37x faster:
stream.filter(Objects::nonNull)
            .map(String::valueOf)
            .filter(s -> s.length() > 0)
            .mapToInt(Integer::parseInt)
            .map(i -> i * 2)
            .mapToLong(i -> i + 1000)
            .mapToDouble(i -> i * 3.5)
            .boxed()
            .mapToLong(Double::intValue)
            .mapToInt(d -> (int) d)
            .boxed()
            .max(Integer::compare)

Other collections like ConcurrentLinkedQueue, ConcurrentSkipListSet, 
CopyOnWriteArrayList, ConcurrentHashMap have similar speedups.

There is no detectable slowdown once we have non-empty streams, since 
the only extra instructions in those cases is an additional if 
(isEmpty()) call. Even for concurrent collections, the isEmpty() is fast.

There are still some issues that need to be solved, specifically lazy 
stream creation. However, besides that, the empty streams behave exactly 
as normal streams would in terms of characteristics and exceptions.

The jdk_util tests are still not working, as they are down-casting to 
AbstractPipeline. Since documentation on that is scarce, I would 
appreciate a bit of guidance on how to fix those.

Regards

Heinz
-- 
Dr Heinz M. Kabutz (PhD CompSci)
Author of "The Java™ Specialists' Newsletter" - www.javaspecialists.eu
Java Champion - www.javachampions.org
JavaOne Rock Star Speaker
Tel: +30 69 75 595 262
Skype: kabutz

On 2021/11/06 18:45, Dr Heinz M. Kabutz wrote:
> Good evening,
>
> a couple of months ago a fellow Java Champion told me that he had 
> "banned" streams at his company, or at least discouraged their use. 
> The reason was their high allocation rates with empty collections. 
> With traditional for loops, if the collection is empty, then hardly 
> any objects are allocated and it is very fast. But if we have a 
> stream, then we first have to build up the entire pipeline, only to 
> discover that we didn't need all those objects and throw them away again.
>
> When communicating with Brian Goetz last week, I mentioned this to him 
> and he suggested that perhaps we could have the stream() method inside 
> Collection check whether it is empty, and if so, to return a 
> specialized class EmptyStream that returns "this" for methods such as 
> filter() and map(). I spent a bit of time trying to write such a 
> class, together with EmptyIntStream, EmptyLongStream and 
> EmptyDoubleStream. I've also written a set of tests that compare our 
> Empty[Int|Long|Double]Streams to what would be returned with 
> Stream[Int|Long|Double].empty(). I've also written a little benchmark 
> to demonstrate its effectiveness.
>
> You can see what I've done here:
>
> https://github.com/openjdk/jdk/pull/6275
>
> (I think I was premature in issuing the PR)
>
> However, I have hit a brick wall with the way that the streams are 
> currently being tested in the JDK. First off, there are several tests 
> that make assumptions about how Stream is implemented and down-casts 
> it to an AbstractPipeline. Since our EmptyStream is not an 
> AbstractPipeline, the tests fail.
>
> Secondly, with a normal stream, some of the methods can only be called 
> once, for example filter() and map(). They return a new stream and we 
> have to continue working with those. With my EmptyStream, since 
> filter() and map() return "this", we would not get an exception if we 
> continued using it.
>
> Thirdly, with a normal stream, the method parallel() changes the state 
> of the current stream, but then returns "this". In order to keep the 
> EmptyStream consistent with the current Stream.empty() behavior, I 
> return StreamSupport.stream(Spliterators.emptySpliterator(), true) 
> from the parallel() method. Thus with the EmptyStream this is opposite 
> to how it currently happens to work. The Javadocs say that the 
> parallel() method "may return itself", but it does not have to, 
> whereas the filter() method seems to suggest that it would be a new 
> stream objects, but it also does not prescribe that it absolutely has 
> to be.
>
> How important is the white-box testing with the streams? And could we 
> perhaps make special cases for empty streams?
>
> Regards
>
> Heinz


More information about the core-libs-dev mailing list