java.util.stream.Stream: API for user-extensible intermediate operations

Viktor Klang viktor.klang at oracle.com
Thu Jun 29 13:24:59 UTC 2023



Over the past 6+ months I've been thinking about, and tinkering with, how we'd be able to expose a user-facing API for extensible intermediate java.util.stream.Stream operations―a feature envisioned all the way back when Streams were created.

I'm now at a point where I have a viable design and implementation, and so I'm turning to you for your feedback: on the direction taken; the API concepts; and, in particular, is there anything which I have overlooked/missed?

>I think this API is overly generic and hard to reason about it, for users and IDEs.

The API is for all intents and purposes Collector with a boolean return type for the accumulator and an added downstream handle parameter added to the accumulator and the finisher.

>The main issue is that the same API is used for both stateless and stateful operations, which means that as a user, we have no idea if a call to stream.gather() is stateful or not.

How is this different from any of the other pre-existing Stream operations?

Most of the operation are stateless, only a handful of well known operations are stateful, all other stateful operations are done by collectors.
But stream.gather() allows both stateless and stateful gatherer.
I believe that instead of having an intermediary operation that can be stateless or stateful, it's better to have a Collector that starts a new stream.

Instead of
  stream.gather(Gatherers.foo())

A collector/gatherer can propagate the elements into a new stream
  stream.collect(Gatherers.foo(stream -> ...))
 or
  stream.collect(Gatherers.foo(), stream -> ...)

This allows a better control on the parallelization (both streams are independant) and a clear path to retrofit the Collectors as Gatherers (instead of having two too similar APIs side by side).

There is no difference in how parallelization would be possible to occur―parallel Stream already runs multi-stage parallel evaluation when stateful stages exist in the same pipeline.



>Which is a departure from the current API that cleanly separate stateless and staful operations. Here, we are left in the dark. In a sense, this API is too powerful, it can do too much thing, so as a user we can not reason about it.

A Gatherer encodes it input and output types, in what sense would that not be enough to reason about it ?

The initial idea of the API is to have almost all intermediary operations to be stateless so by default, we know that the complexity of the stream is linear,  it will parallelize quite well, etc.
Once you have an intermediary operation like a gatherer, all this "good" property are null and void.

There are quite a few stateful operations on streams since a long time―the slicing operations, the distinct operations, the sorting operations, the limit operation, the while-operations etc. My bet is that most developers using streams will not know from a glance which ones of those are to be considered stateful and has an impact on evaluation―and personally I think that is a great thing, as it is an implementation detail.



>I like the idea of a Collector 2.0 i.e. using the Gatherer API at the end of the stream (not in the middle), but currently, the Gatherer API is not a Collector, so we now have two different APIs for doing partially the same job. I wonder if the Collector API can be retroffitted to act as a Gatherer API, avoiding to have to choose which one to use, a gatherer being the equivalent of a "flat-collector" + short-circuit.

Collector serves a very important role of being able to get information out of a Stream and deliver that information in a certain shape, a Gatherer does not provide any facility for this.
A collector can get information out of a Stream into a new one, at that point you have something quite similar to a Gatherer.
It sounds like you're describing Gatherer―it's a Collector like construct which can output into a new stream.


>The idea of unsupportedCombiner() seems out of place, like a patch to be able to clobble different things together. I'm not sure to understand why it's needed for a Gatherer, and why it is not needed for Collectors ?

Nothing prevents us from treating a `null` combiner the same way. My primary reason for making it a dedicated thing was to be able to differentiate a possible bug (user passing in a null reference inadvertently) from explicitly stating that a combiner does not exist from this operation.

unsupportedCombiner() as an artifact can be completely hidden if desired, as Gatherer.of() can have permutations without specifying a combiner, and the default method of Gatherer.combiner() could return unsupportedCombiner(). I opted not to do this initially, because I felt like being explicit about not having a combiner means that it is a concious decision by the implementor of the Gatherer.
My question is more, why do we need this unsupportedCombiner on a Gatherer and not on a Collector ?
We can definitely investigate adding that to Collector as well, but it is outside of the scope of this proposal as it deals with intermediate, not terminal, operations.



>So I would prefer that API to extends the current Collector API but not the intermediary operations. Yes, it's less powerful.
It means that instead of using one stream with a collect like operation in the middle, users will have to use two streams, one after the other, but it makes the code easier to understand (also having two streams give users better control on which part should be in parallel).

That would be something completely different from the goal of providing user-extensible intermediate operations, which is something which this proposal is explicitly trying to address.
User-extensible intermediate operations or a gatherer as a better collector have similar semantics, so it's not a completely different.
The difference in semantics between Gatherer and Collector is outlined in the initial post of this document. With that said, I think your position has been made clear―you prefer augmenting the terminal operation rather than introducing an intermediate operation.

Cheers,
√

regards,
Rémi

>Rémi


(If you, like myself, prefer reading pre-rendered markdown, click here<https://cr.openjdk.org/~vklang/Gatherers.html>)



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20230629/2145281e/attachment-0001.htm>


More information about the core-libs-dev mailing list