[External] : Re: JDK-8072840: Presizing for Stream Collectors

Viktor Klang viktor.klang at oracle.com
Thu Feb 13 16:45:19 UTC 2025


>I could see that being useful for properties such as non-nullness,
which would allow collections such as ImmutableList to skip the null
check in the end.

I'm thinking things like ordered/unordered, whether the stream is parallel or not (might want to use different representation for a sequential stream), etc.

>Do you think that there could be a need to pass stream information to
anything other than the Gatherer's state initializer? Based on a
cursory glance, it looks straightforward to pass the same info to it
as to the Collector. If that's true and we go with a more extensible
design than a plain long, Gatherers could be opted in in follow-up
work.

It's more involved than that—as Gatherers produce output, it would be necessary to devise a scheme which allows Gatherers to communicate upper and lower bounds on the output. This information would then need to be threaded through the chain of gatherers and emerge on the other side. This is slightly more involved than just communicating characteristics, since it is information based off of the stream and not merely the operation itself.

Cheers,
√


Viktor Klang
Software Architect, Java Platform Group
Oracle

________________________________
From: Fabian Meumertzheim <fabian at buildbuddy.io>
Sent: Thursday, 13 February 2025 17:11
To: Viktor Klang <viktor.klang at oracle.com>
Cc: core-libs-dev at openjdk.org <core-libs-dev at openjdk.org>
Subject: [External] : Re: JDK-8072840: Presizing for Stream Collectors

On Thu, Feb 13, 2025 at 3:06 PM Viktor Klang <viktor.klang at oracle.com> wrote:
> While it may look enticing to merely propagate expected element count as an input parameter to the supplier function,
> I think it deserves some extra thought, specifically if it may make more sense to pass some sort of StreamInfo type which can provide more metadata in the future.

I could see that being useful for properties such as non-nullness,
which would allow collections such as ImmutableList to skip the null
check in the end.

> Another open question is how to propagate this information through Gatherers (i.e. a bigger scope than Collector-augmentation) to enable more sophisticated optimizations—because ultimately the availability of the information throughout the pipeline is going to be important for Collector.

Do you think that there could be a need to pass stream information to
anything other than the Gatherer's state initializer? Based on a
cursory glance, it looks straightforward to pass the same info to it
as to the Collector. If that's true and we go with a more extensible
design than a plain long, Gatherers could be opted in in follow-up
work.

Best,
Fabian

>
>
> Cheers,
>>
>
> Viktor Klang
> Software Architect, Java Platform Group
> Oracle
> ________________________________
> From: core-libs-dev <core-libs-dev-retn at openjdk.org> on behalf of Fabian Meumertzheim <fabian at buildbuddy.io>
> Sent: Wednesday, 12 February 2025 11:09
> To: core-libs-dev at openjdk.org <core-libs-dev at openjdk.org>
> Subject: JDK-8072840: Presizing for Stream Collectors
>
> As an avid user of Guava's ImmutableCollections, I have been
> interested in ways to close the efficiency gap between the built-in
> `Stream#toList()` and third-party `Collector` implementations such as
> `ImmutableList#toImmutableList()`. I've found the biggest problem to
> be the lack of sizing information in `Collector`s, which led to me to
> draft a solution to JDK-8072840:
> https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/23461__;!!ACWV5N9M2RV99hQ!N-RbriJ93dED1WYLFxFZ4dD5oTx5wqPCPTmv4Oivm3IFJTHNwZ1v3d228Ifs8SdFJwcc7YZnCuNZXG9LmQ3ZCA4$
>
> The benchmark shows pretty significant gains for sized streams that
> mostly reshape data (e.g. slice records or turn a list into a map by
> associating keys), which I've found to be a pretty common use case.
>
> Before I formally send out the PR for review, I would like to gather
> feedback on the design aspects of it (rather than the exact
> implementation). I will thus leave it in draft mode for now, but
> invite anyone to comment on it or on this thread.
>
> Fabian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20250213/6fddea8e/attachment.htm>


More information about the core-libs-dev mailing list