RFR: JDK-8277095 : Empty streams create too many objects [v2]

Thu Jul 21 14:34:01 UTC 2022

----- Original Message -----
> From: "John R Rose" <jrose at openjdk.org>
> To: core-libs-dev at openjdk.org
> Sent: Thursday, July 21, 2022 4:12:14 AM
> Subject: Re: RFR: JDK-8277095 : Empty streams create too many objects [v2]

> On Tue, 16 Nov 2021 20:53:26 GMT, kabutz <duke at openjdk.org> wrote:
> 
>>> This is a draft proposal for how we could improve stream performance for the
>>> case where the streams are empty. Empty collections are common-place. If we
>>> iterate over them with an Iterator, we would have to create one small Iterator
>>> object (which could often be eliminated) and if it is empty we are done.
>>> However, with Streams we first have to build up the entire pipeline, until we
>>> realize that there is no work to do. With this example, we change
>>> Collection#stream() to first check if the collection is empty, and if it is, we
>>> simply return an EmptyStream. We also have EmptyIntStream, EmptyLongStream and
>>> EmptyDoubleStream. We have taken great care for these to have the same
>>> characteristics and behaviour as the streams returned by Stream.empty(),
>>> IntStream.empty(), etc.
>>> 
>>> Some of the JDK tests fail with this, due to ClassCastExceptions (our
>>> EmptyStream is not an AbstractPipeline) and AssertionError, since we can call
>>> some methods repeatedly on the stream without it failing. On the plus side,
>>> creating a complex stream on an empty stream gives us upwards of 50x increase
>>> in performance due to a much smaller object allocation rate. This PR includes
>>> the code for the change, unit tests and also a JMH benchmark to demonstrate the
>>> improvement.
>>
>> kabutz has updated the pull request incrementally with one additional commit
>> since the last revision:
>> 
>>   Refactored empty stream implementations to reduce duplicate code and improved
>>   unordered()
>>   Added StreamSupport.empty for primitive spliterators and use that in
>>   Arrays.stream()
> 
> I agree it’s the “kind of” optimization that would be nice.  “Kind of”.
> Personally I would be happier to see complexity like this added that would
> help a larger class of common streams.
> 
> It’s a harder problem, and I know this is case of “the best is the enemy of the
> good”, but I think a stream which has less content bulk than pipeline phases
> (according to some heuristic weighting) might possibly be handled better by
> dumping the elements into an Object array and running each phase in sequence
> over that array, rather than composing a “net result of all phases” object and
> then running it over the few elements.  Stream object creation can be reduced,
> perhaps, by building the stream around a small internal buffer that collects
> pipeline phases (and their lambdas), by side effect.  The terminal operation
> runs this Rube-Goldberg contraption in an interpretive manner over the
> elements.  An advantage would arise if the contraption were smaller and simpler
> than a fully-composed stream of today, and the optimizations lost by having an
> interpreter instead of a specialized object ness were insignificant due to the
> small bulk of the stream source.

I don't think it will ever work in real life because there are a lot of streams that only works based on luck and how stream are currently implemented.

Last week, when grading a student project, i've seen a stream that can be simplified to
  Arrays.asList(3, null).stream().map(Object::toString).count()

> 
> -------------
> 
> PR: https://git.openjdk.org/jdk/pull/6275

Rémi