[External] : Re: Stream Gatherers (JEP 473) feedback
Viktor Klang
viktor.klang at oracle.com
Thu Sep 19 09:30:46 UTC 2024
Hi Anthony,
Bear with me for a moment,
in the same vein as there's nothing which enforces equals(…) or hashCode() to be conformant to their specs, or any interface-implementation for that matter, I don't see how we could make any stronger enforcement of Gatherers.
>My belief is that the subject of reusability hasn't come up before because non-reusable Gatherers "just work": as long as instances of such Gatherers are not reused, they don't lead to unexpected results or observable differences in behavior. And so people have been implementing non-reusable Gatherers such as `concat` and `zip` without realizing they aren't compliant. Or maybe they did realize it, but didn't see the downside of being non-compliant.
Alas, there's no place where this could be enforced, users could have their own implementations of Stream (so cannot be enforced in Stream::gather). Ultimately, it all boils down to specification—if an equals(…)-method implementation leads to surprising behavior when used with a collection, one typically needs to first ensure that the equals(…)-method conforms to its expected specification before one can presume that the collection has a bug.
For the "just work"—scenario, one can only make claims about things which have been proven. So in this case, what tests have passed for the implementation in question?
>Which brings me to my next point: in case of (b), the Javadoc and/or JEP should explain the rationale. Even to me it still seems like a needless restriction.
java.util.stream.Stream does not explain the rationale for why it is single-use, Collector does not explain why they are reusable, why would Gatherers be held to a different standard?
> "protecting the users from being given non-reusable Gatherers"
Think of it more like increasing the odds that users are given spec-conformant Gatherers.
Cheers,
√
Viktor Klang
Software Architect, Java Platform Group
Oracle
________________________________
From: Anthony Vanelverdinghe <dev at anthonyv.be>
Sent: Wednesday, 18 September 2024 18:27
To: Viktor Klang <viktor.klang at oracle.com>; core-libs-dev at openjdk.org <core-libs-dev at openjdk.org>
Subject: [External] : Re: Stream Gatherers (JEP 473) feedback
Hi Viktor
Let me start with a question: is the requirement (a) "a Gatherer SHOULD be reusable", or (b) "a Gatherer MUST be reusable"?
As of today the specification says (b), whereas the implementation matches (a).
In case of (a), I propose to align the specification to allow for compliant, non-reusable Gatherers.
In case of (b), I propose to align the implementation to enforce compliance. Something like:
(1) invoke `initializer()` twice, giving `i1` and `i2`. Discard `i1` and invoke `i2` twice, giving `state1` and `state2`.
(2) invoke `finisher()` twice, giving `f1` and `f2`. Discard `f1` and invoke `f2` twice, the first time with `state1` and a dummy Downstream, the second time with the actual final state, i.e. `state2` after all elements were integrated, and the actual Downstream.
Then backport this change to JDK 23 & 22 and/or do another round of preview in JDK 24.
I'm confident that enforcing compliance would result in significant amounts of feedback questioning the requirement.
My belief is that the subject of reusability hasn't come up before because non-reusable Gatherers "just work": as long as instances of such Gatherers are not reused, they don't lead to unexpected results or observable differences in behavior. And so people have been implementing non-reusable Gatherers such as `concat` and `zip` without realizing they aren't compliant. Or maybe they did realize it, but didn't see the downside of being non-compliant.
Which brings me to my next point: in case of (b), the Javadoc and/or JEP should explain the rationale. Even to me it still seems like a needless restriction. You say:
> And I think the worst of all worlds would be a scenario where you, as a user, are given a Gatherer<X,Y,Z> and you have no idea whether you can re-use it or not.
so I'd guess the rationale is "protecting the users from being given non-reusable Gatherers".
However, I can't readily think of a situation where this would be essential.
If a user creates a Gatherer by invoking a factory method, the factory method can specify whether its result is reusable.
And if a user is given a Gatherer as a method argument, and they need the Gatherer to be reusable, they could change the parameter to a `Supplier<Gatherer>` instead.
> >In a previous response you proposed using `Gatherer concat(Supplier<Stream<T>>)` instead, but then I'd just pass `() -> aStream`, wonder why the parameter isn't just a `Stream<T>`, and the Gatherer would still not be reusable.
>
> There's a very important, to me, difference between the two. In the Stream-case, there exists 0 reusable usages. For the Supplier<Stream>-case the implementation does not restrict re-usability, but rather it is up to the caller to actively opt-out of reusability (which could of course also be declared to be undefined behavior of the implementor of said Gatherer). Local non-reusability decided by the caller > Global non-reusability decided by the callee.
We agree, just that I'd provide 2 factory methods, `concat(Stream<T>)` (non-reusable) and `append(List<T>)` (reusable), whereas you'd provide a 2-in-1 `concat(Supplier<Stream<T>>)`.
Kind regards, Anthony
September 12, 2024 at 11:55 PM, "Viktor Klang" <viktor.klang at oracle.com> wrote:
>
> Hi Anthony
>
> Great questions! I had typed up a long response when my email client decided the email was too large, crashed, and deleted my draft, so I'll try to recreate what I wrote from memory.
>
> >While I understand that most Gatherers will be reusable, and that it's a desirable characteristic, surely there will also be non-reusable Gatherers?
>
> To me, this is governed by the following parts of the Gatherer specification https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.html :
>
> "Each invocation of initializer() https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.html#initializer() ,integrator() https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.html#integrator() ,combiner() https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.html#combiner() , and finisher() https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.html#finisher() must return a semantically identical result."
>
> and
>
> "Implementations of Gatherer must not capture, retain, or expose to other threads, the references to the state instance, or the downstreamGatherer.Downstream https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.Downstream.html PREVIEW https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.Downstream.html#preview-java.util.stream.Gatherer.Downstream for longer than the invocation duration of the method which they are passed to."
>
> And I think the worst of all worlds would be a scenario where you, as a user, are given a Gatherer<X,Y,Z> and you have no idea whether you can re-use it or not.
>
> For Stream, the assumption is that they are NOT reusable at all.
> For Gatherer, I think the only reasonable assumption is that they are reusable.
>
> >In particular, any Gatherer that is the result of a factory method with a `Stream<T>` parameter which supports infinite Streams, will be non-reusable, won't it?
>
> Not necessarily, if the factory method **consumes** the Stream and creates a stable result which is reusable, then the resulting Gatherer is reusable.
>
> >In a previous response you proposed using `Gatherer concat(Supplier<Stream<T>>)` instead, but then I'd just pass `() -> aStream`, wonder why the parameter isn't just a `Stream<T>`, and the Gatherer would still not be reusable.
>
> There's a very important, to me, difference between the two. In the Stream-case, there exists 0 reusable usages. For the Supplier<Stream>-case the implementation does not restrict re-usability, but rather it is up to the caller to actively opt-out of reusability (which could of course also be declared to be undefined behavior of the implementor of said Gatherer). Local non-reusability decided by the caller > Global non-reusability decided by the callee.
>
> >As another example, take Gunnar Morling's zip Gatherers:
>
> I don't see how Gatherers like this could be made reusable, or why that would even be desirable.
>
> Having been R&D-ing in the Stream-space more than a decade, I'm convinced that there's no universally safe way to implement `zip` for push-style stream designs. I'm happy to be proven wrong though, as that would open up some interesting possibilities for things like Stream::iterator() and Stream:spliterator().
>
> >My use case was about a pipeline where the concatenation comes somewhere in the middle of the pipeline.
>
> My apologies, I misunderstood. To me, the operation you describe is called `inject`.
> Given a stable (reusable) source of elements you can definitely implement Gatherers which do before, during, or after-injections of elements to a stream.
>
> Thanks again for the great questions and conversation, it's valuable!
> Cheers,
>
> √
>
> **Viktor Klang**
> Software Architect, Java Platform Group
>
> Oracle
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20240919/d87a0a31/attachment-0001.htm>
More information about the core-libs-dev
mailing list