JDK-8072840: Presizing for Stream Collectors
Fabian Meumertzheim
fabian at buildbuddy.io
Thu Feb 13 16:11:23 UTC 2025
On Thu, Feb 13, 2025 at 3:06 PM Viktor Klang <viktor.klang at oracle.com> wrote:
> While it may look enticing to merely propagate expected element count as an input parameter to the supplier function,
> I think it deserves some extra thought, specifically if it may make more sense to pass some sort of StreamInfo type which can provide more metadata in the future.
I could see that being useful for properties such as non-nullness,
which would allow collections such as ImmutableList to skip the null
check in the end.
> Another open question is how to propagate this information through Gatherers (i.e. a bigger scope than Collector-augmentation) to enable more sophisticated optimizations—because ultimately the availability of the information throughout the pipeline is going to be important for Collector.
Do you think that there could be a need to pass stream information to
anything other than the Gatherer's state initializer? Based on a
cursory glance, it looks straightforward to pass the same info to it
as to the Collector. If that's true and we go with a more extensible
design than a plain long, Gatherers could be opted in in follow-up
work.
Best,
Fabian
>
>
> Cheers,
> √
>
>
> Viktor Klang
> Software Architect, Java Platform Group
> Oracle
> ________________________________
> From: core-libs-dev <core-libs-dev-retn at openjdk.org> on behalf of Fabian Meumertzheim <fabian at buildbuddy.io>
> Sent: Wednesday, 12 February 2025 11:09
> To: core-libs-dev at openjdk.org <core-libs-dev at openjdk.org>
> Subject: JDK-8072840: Presizing for Stream Collectors
>
> As an avid user of Guava's ImmutableCollections, I have been
> interested in ways to close the efficiency gap between the built-in
> `Stream#toList()` and third-party `Collector` implementations such as
> `ImmutableList#toImmutableList()`. I've found the biggest problem to
> be the lack of sizing information in `Collector`s, which led to me to
> draft a solution to JDK-8072840:
> https://github.com/openjdk/jdk/pull/23461
>
> The benchmark shows pretty significant gains for sized streams that
> mostly reshape data (e.g. slice records or turn a list into a map by
> associating keys), which I've found to be a pretty common use case.
>
> Before I formally send out the PR for review, I would like to gather
> feedback on the design aspects of it (rather than the exact
> implementation). I will thus leave it in draft mode for now, but
> invite anyone to comment on it or on this thread.
>
> Fabian
More information about the core-libs-dev
mailing list