java.util.stream.Stream: API for user-extensible intermediate operations

Thu Jun 29 12:13:53 UTC 2023

> From: "Viktor Klang" <viktor.klang at oracle.com>
> To: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "core-libs-dev" <core-libs-dev at openjdk.org>
> Sent: Thursday, June 29, 2023 11:09:22 AM
> Subject: Re: java.util.stream.Stream: API for user-extensible intermediate
> operations

> From: Remi Forax <forax at univ-mlv.fr>
> Sent: Thursday, 29 June 2023 10:03
> To: Viktor Klang <viktor.klang at oracle.com>
> Cc: core-libs-dev <core-libs-dev at openjdk.org>
> Subject: [External] : Re: java.util.stream.Stream: API for user-extensible
> intermediate operations

>> From: "Viktor Klang" <viktor.klang at oracle.com>
>> To: "core-libs-dev" <core-libs-dev at openjdk.org>
>> Sent: Tuesday, June 27, 2023 7:10:42 PM
>> Subject: java.util.stream.Stream: API for user-extensible intermediate
>> operations

>> Hi core-libs-dev,

>> Over the past 6+ months I've been thinking about, and tinkering with, how we'd
>> be able to expose a user-facing API for extensible intermediate
>> java.util.stream.Stream operations—a feature envisioned all the way back when
>> Streams were created.

>> I'm now at a point where I have a viable design and implementation, and so I'm
>> turning to you for your feedback: on the direction taken; the API concepts;
>> and, in particular, is there anything which I have overlooked/missed?

>>I think this API is overly generic and hard to reason about it, for users and
> >IDEs.

> The API is for all intents and purposes Collector with a boolean return type for
> the accumulator and an added downstream handle parameter added to the
> accumulator and the finisher.

>>The main issue is that the same API is used for both stateless and stateful
>>operations, which means that as a user, we have no idea if a call to
> >stream.gather() is stateful or not.

> How is this different from any of the other pre-existing Stream operations?

Most of the operation are stateless, only a handful of well known operations are stateful, all other stateful operations are done by collectors. 
But stream.gather() allows both stateless and stateful gatherer. 
I believe that instead of having an intermediary operation that can be stateless or stateful, it's better to have a Collector that starts a new stream. 

Instead of 
stream.gather(Gatherers.foo()) 

A collector/gatherer can propagate the elements into a new stream 
stream.collect(Gatherers.foo(stream -> ...)) 
or 
stream.collect(Gatherers.foo(), stream -> ...) 

This allows a better control on the parallelization (both streams are independant) and a clear path to retrofit the Collectors as Gatherers (instead of having two too similar APIs side by side). 

>>Which is a departure from the current API that cleanly separate stateless and
>>staful operations. Here, we are left in the dark. In a sense, this API is too
> >powerful, it can do too much thing, so as a user we can not reason about it.

> A Gatherer encodes it input and output types, in what sense would that not be
> enough to reason about it ?

The initial idea of the API is to have almost all intermediary operations to be stateless so by default, we know that the complexity of the stream is linear, it will parallelize quite well, etc. 
Once you have an intermediary operation like a gatherer, all this "good" property are null and void. 

>>I like the idea of a Collector 2.0 i.e. using the Gatherer API at the end of the
>>stream (not in the middle), but currently, the Gatherer API is not a Collector,
>>so we now have two different APIs for doing partially the same job. I wonder if
>>the Collector API can be retroffitted to act as a Gatherer API, avoiding to
>>have to choose which one to use, a gatherer being the equivalent of a
> >"flat-collector" + short-circuit.

> Collector serves a very important role of being able to get information out of a
> Stream and deliver that information in a certain shape, a Gatherer does not
> provide any facility for this.

A collector can get information out of a Stream into a new one , at that point you have something quite similar to a Gatherer. 

>>The idea of unsupportedCombiner() seems out of place, like a patch to be able to
>>clobble different things together. I'm not sure to understand why it's needed
> >for a Gatherer, and why it is not needed for Collectors ?

> Nothing prevents us from treating a `null` combiner the same way. My primary
> reason for making it a dedicated thing was to be able to differentiate a
> possible bug (user passing in a null reference inadvertently) from explicitly
> stating that a combiner does not exist from this operation.

> unsupportedCombiner() as an artifact can be completely hidden if desired, as
> Gatherer.of() can have permutations without specifying a combiner, and the
> default method of Gatherer.combiner() could return unsupportedCombiner() . I
> opted not to do this initially, because I felt like being explicit about not
> having a combiner means that it is a concious decision by the implementor of
> the Gatherer.

My question is more, why do we need this unsupportedCombiner on a Gatherer and not on a Collector ? 

>>So I would prefer that API to extends the current Collector API but not the
> >intermediary operations. Yes, it's less powerful.
> It means that instead of using one stream with a collect like operation in the
> middle, users will have to use two streams, one after the other, but it makes
> the code easier to understand (also having two streams give users better
> control on which part should be in parallel).

> That would be something completely different from the goal of providing
> user-extensible intermediate operations, which is something which this proposal
> is explicitly trying to address.

User-extensible intermediate operations or a gatherer as a better collector have similar semantics, so it's not a completely different. 

> Cheers,
> √

regards, 
Rémi 

> >Rémi

>> (If you, like myself, prefer reading pre-rendered markdown, [
>> https://cr.openjdk.org/~vklang/Gatherers.html |
>> click here ] )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20230629/a9853a4a/attachment-0001.htm>