[External] : Re: Stream Gatherers (JEP 473) feedback
Viktor Klang
viktor.klang at oracle.com
Mon Sep 23 15:19:48 UTC 2024
Hi Anthony,
>The idea is to collect feedback, to see how many people report their Gatherers being broken (i.e. their Gatherers being non-compliant without realizing it), so enforcing it in `Stream::gather` is sufficient for this purpose.
Even if this is well-intentioned, my experience tells me that this feedback will not materialize, and trying to provoke conformance at runtime will have a noticeable performance impact not encumbering other intermediate operations, especially for processing the bulk majority of streams (which tend to be less than 10 elements in size).
>This hasn't come up before because it requires people (a) to read the Javadoc, (b) to connect the dots and conclude "thus, a Gatherer must be reusable", and (c) to be willing to invest their time in asking the question, rather than moving on since their Gatherers "just work".
Adding clarifications to the Javadoc may be the most balanced path forward, in doing so we're talking updating the documentation for both Gatherer and Collector.
>Not sure I understand this argument? I'd argue that increasing those odds would be done by allowing an additional category of Gatherers, not by prohibiting it?
No, that would be "moving the goalposts" i.e. making Gatherers specified more loosely. Developers will write the code that they write, but if something isn't behaving as expected, it is important to know which side to debug—the library or the user code.
> I've written a `concat` Gatherer being blissfully unaware that it was not compliant, others have written non-reusable Gatherers as well: they exist and things like `concat` and `zip` are natural/intuitive use cases for Gatherers.
The ability for developers to implement interfaces (knowingly or unknowingly) in non-spec-conforming ways aside, are you suggesting that we add a section in the Gatherer (and Collector) in the form of "Implementor Notes" that are a bit more high-level than the specification?
Cheers,
√
Viktor Klang
Software Architect, Java Platform Group
Oracle
________________________________
From: Anthony Vanelverdinghe <dev at anthonyv.be>
Sent: Thursday, 19 September 2024 20:57
To: Viktor Klang <viktor.klang at oracle.com>; core-libs-dev at openjdk.org <core-libs-dev at openjdk.org>
Subject: [External] : Re: Stream Gatherers (JEP 473) feedback
Hi Viktor
> Alas, there's no place where this could be enforced, users could have their own implementations of Stream (so cannot be enforced in Stream::gather).
The idea is to collect feedback, to see how many people report their Gatherers being broken (i.e. their Gatherers being non-compliant without realizing it), so enforcing it in `Stream::gather` is sufficient for this purpose.
This hasn't come up before because it requires people (a) to read the Javadoc, (b) to connect the dots and conclude "thus, a Gatherer must be reusable", and (c) to be willing to invest their time in asking the question, rather than moving on since their Gatherers "just work".
> java.util.stream.Stream does not explain the rationale for why it is single-use, Collector does not explain why they are reusable, why would Gatherers be held to a different standard?
For `Stream` the package Javadoc has statements like "No storage. A stream is not a data structure" and "Possibly unbounded.", which is sufficient rationale to me.
For `Collector`, unless I'm missing something, it does not actually specify that it must be reusable, so it does not have to provide a rationale for it either. Even if I did miss something and reusability is implied from the specification: the question would likely never come up, because a Collector will in practice always be reusable anyway (read: I can't readily think of a sensible non-reusable Collector). This is unlike Gatherer, where some obvious use cases such as `concat` and `zip` exist and people like me wonder why such use cases are, apparently needlessly, prohibited by the Gatherer specification.
> Think of it more like increasing the odds that users are given spec-conformant Gatherers.
Not sure I understand this argument? I'd argue that increasing those odds would be done by allowing an additional category of Gatherers, not by prohibiting it? I've written a `concat` Gatherer being blissfully unaware that it was not compliant, others have written non-reusable Gatherers as well: they exist and things like `concat` and `zip` are natural/intuitive use cases for Gatherers. Gunnar wrote a blog post [https://urldefense.com/v3/__https://www.morling.dev/blog/zipping-gatherer/__;!!ACWV5N9M2RV99hQ!KmRJAZ0OMfv5XrDKYFVNTJyVWBah899OR9tdZKUHJB928SXc6VEdT4ni1AHI_lGezKchV9kYO04XUdClsg$ ] about his `zip` Gatherer saying "Java 22 [...] promises to improve the situation here." and none of his readers pointed out that his Gatherer is not compliant either (nor complained that his Gatherer is not reusable).
Kind regards, Anthony
September 19, 2024 at 11:30 AM, "Viktor Klang" <viktor.klang at oracle.com> wrote:
>
> Hi Anthony,
>
> Bear with me for a moment,
>
> in the same vein as there's nothing which *enforces* equals(…) or hashCode() to be conformant to their specs, or any interface-implementation for that matter, I don't see how we could make any stronger enforcement of Gatherers.
>
> >My belief is that the subject of reusability hasn't come up before because non-reusable Gatherers "just work": as long as instances of such Gatherers are not reused, they don't lead to unexpected results or observable differences in behavior. And so people have been implementing non-reusable Gatherers such as `concat` and `zip` without realizing they aren't compliant. Or maybe they did realize it, but didn't see the downside of being non-compliant.
>
> Alas, there's no place where this could be enforced, users could have their own implementations of Stream (so cannot be enforced in Stream::gather). Ultimately, it all boils down to specification—if an equals(…)-method implementation leads to surprising behavior when used with a collection, one typically needs to first ensure that the equals(…)-method conforms to its expected specification before one can presume that the collection has a bug.
>
> For the "just work"—scenario, one can only make claims about things which have been proven. So in this case, what tests have passed for the implementation in question?
>
> >Which brings me to my next point: in case of (b), the Javadoc and/or JEP should explain the rationale. Even to me it still seems like a needless restriction.
> java.util.stream.Stream does not explain the rationale for why it is single-use, Collector does not explain why they are reusable, why would Gatherers be held to a different standard?
>
> > "protecting the users from being given non-reusable Gatherers"
> Think of it more like increasing the odds that users are given spec-conformant Gatherers.
>
> Cheers,
>
> √
>
> **Viktor Klang**
> Software Architect, Java Platform Group
>
> Oracle
>
> ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
>
> **From:** Anthony Vanelverdinghe <dev at anthonyv.be>
> **Sent:** Wednesday, 18 September 2024 18:27
> **To:** Viktor Klang <viktor.klang at oracle.com>; core-libs-dev at openjdk.org <core-libs-dev at openjdk.org>
> **Subject:** [External] : Re: Stream Gatherers (JEP 473) feedback
>
>
> Hi Viktor
>
> Let me start with a question: is the requirement (a) "a Gatherer SHOULD be reusable", or (b) "a Gatherer MUST be reusable"?
>
> As of today the specification says (b), whereas the implementation matches (a).
>
> In case of (a), I propose to align the specification to allow for compliant, non-reusable Gatherers.
>
> In case of (b), I propose to align the implementation to enforce compliance. Something like:
>
> (1) invoke `initializer()` twice, giving `i1` and `i2`. Discard `i1` and invoke `i2` twice, giving `state1` and `state2`.
>
> (2) invoke `finisher()` twice, giving `f1` and `f2`. Discard `f1` and invoke `f2` twice, the first time with `state1` and a dummy Downstream, the second time with the actual final state, i.e. `state2` after all elements were integrated, and the actual Downstream.
>
> Then backport this change to JDK 23 & 22 and/or do another round of preview in JDK 24.
>
> I'm confident that enforcing compliance would result in significant amounts of feedback questioning the requirement.
>
> My belief is that the subject of reusability hasn't come up before because non-reusable Gatherers "just work": as long as instances of such Gatherers are not reused, they don't lead to unexpected results or observable differences in behavior. And so people have been implementing non-reusable Gatherers such as `concat` and `zip` without realizing they aren't compliant. Or maybe they did realize it, but didn't see the downside of being non-compliant.
>
> Which brings me to my next point: in case of (b), the Javadoc and/or JEP should explain the rationale. Even to me it still seems like a needless restriction. You say:
>
> > And I think the worst of all worlds would be a scenario where you, as a user, are given a Gatherer<X,Y,Z> and you have no idea whether you can re-use it or not.
>
> so I'd guess the rationale is "protecting the users from being given non-reusable Gatherers".
>
> However, I can't readily think of a situation where this would be essential.
>
> If a user creates a Gatherer by invoking a factory method, the factory method can specify whether its result is reusable.
>
> And if a user is given a Gatherer as a method argument, and they need the Gatherer to be reusable, they could change the parameter to a `Supplier<Gatherer>` instead.
>
> > >In a previous response you proposed using `Gatherer concat(Supplier<Stream<T>>)` instead, but then I'd just pass `() -> aStream`, wonder why the parameter isn't just a `Stream<T>`, and the Gatherer would still not be reusable.
>
> >
>
> > There's a very important, to me, difference between the two. In the Stream-case, there exists 0 reusable usages. For the Supplier<Stream>-case the implementation does not restrict re-usability, but rather it is up to the caller to actively opt-out of reusability (which could of course also be declared to be undefined behavior of the implementor of said Gatherer). Local non-reusability decided by the caller > Global non-reusability decided by the callee.
>
> We agree, just that I'd provide 2 factory methods, `concat(Stream<T>)` (non-reusable) and `append(List<T>)` (reusable), whereas you'd provide a 2-in-1 `concat(Supplier<Stream<T>>)`.
>
> Kind regards, Anthony
>
> September 12, 2024 at 11:55 PM, "Viktor Klang" <viktor.klang at oracle.com> wrote:
>
> >
>
> > Hi Anthony
>
> >
>
> > Great questions! I had typed up a long response when my email client decided the email was too large, crashed, and deleted my draft, so I'll try to recreate what I wrote from memory.
>
> >
>
> > >While I understand that most Gatherers will be reusable, and that it's a desirable characteristic, surely there will also be non-reusable Gatherers?
>
> >
>
> > To me, this is governed by the following parts of the Gatherer specification https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.html :
>
> >
>
> > "Each invocation of initializer() https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.html#initializer() ,integrator()https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.html#integrator() ,combiner()https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.html#combiner() , and finisher()https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.html#finisher() must return a semantically identical result."
>
> >
>
> > and
>
> >
>
> > "Implementations of Gatherer must not capture, retain, or expose to other threads, the references to the state instance, or the downstreamGatherer.Downstreamhttps://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.Downstream.html PREVIEWhttps://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.Downstream.html#preview-java.util.stream.Gatherer.Downstream for longer than the invocation duration of the method which they are passed to."
>
> >
>
> > And I think the worst of all worlds would be a scenario where you, as a user, are given a Gatherer<X,Y,Z> and you have no idea whether you can re-use it or not.
>
> >
>
> > For Stream, the assumption is that they are NOT reusable at all.
>
> > For Gatherer, I think the only reasonable assumption is that they are reusable.
>
> >
>
> > >In particular, any Gatherer that is the result of a factory method with a `Stream<T>` parameter which supports infinite Streams, will be non-reusable, won't it?
>
> >
>
> > Not necessarily, if the factory method **consumes** the Stream and creates a stable result which is reusable, then the resulting Gatherer is reusable.
>
> >
>
> > >In a previous response you proposed using `Gatherer concat(Supplier<Stream<T>>)` instead, but then I'd just pass `() -> aStream`, wonder why the parameter isn't just a `Stream<T>`, and the Gatherer would still not be reusable.
>
> >
>
> > There's a very important, to me, difference between the two. In the Stream-case, there exists 0 reusable usages. For the Supplier<Stream>-case the implementation does not restrict re-usability, but rather it is up to the caller to actively opt-out of reusability (which could of course also be declared to be undefined behavior of the implementor of said Gatherer). Local non-reusability decided by the caller > Global non-reusability decided by the callee.
>
> >
>
> > >As another example, take Gunnar Morling's zip Gatherers:
>
> >
>
> > I don't see how Gatherers like this could be made reusable, or why that would even be desirable.
>
> >
>
> > Having been R&D-ing in the Stream-space more than a decade, I'm convinced that there's no universally safe way to implement `zip` for push-style stream designs. I'm happy to be proven wrong though, as that would open up some interesting possibilities for things like Stream::iterator() and Stream:spliterator().
>
> >
>
> > >My use case was about a pipeline where the concatenation comes somewhere in the middle of the pipeline.
>
> >
>
> > My apologies, I misunderstood. To me, the operation you describe is called `inject`.
>
> > Given a stable (reusable) source of elements you can definitely implement Gatherers which do before, during, or after-injections of elements to a stream.
>
> >
>
> > Thanks again for the great questions and conversation, it's valuable!
>
> > Cheers,
>
> >
>
> > √
>
> >
>
> > **Viktor Klang**
>
> > Software Architect, Java Platform Group
>
> >
>
> > Oracle
>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20240923/8d4bae5f/attachment-0001.htm>
More information about the core-libs-dev
mailing list