Stream Gatherers (JEP 473) feedback

Viktor Klang viktor.klang at oracle.com
Mon Jul 29 18:08:45 UTC 2024


Hi Anthony,

Thank you for your patience, and for providing feedback, it is always much appreciated.


>When writing factory methods for Gatherers, there's sometimes a
degenerate case that requires returning a no-op Gatherer. So I'd like a
way to mark a no-op Gatherer as such, allowing the Stream implementation
to recognize and eliminate it from the pipeline. One idea is to add
Gatherer.defaultIntegrator(), analogous to the other default… methods.
Another is to add Gatherers.identity(), analogous to Function.identity().

I contemplated adding that but in the end I decided I didn't want to add it for the sake of adding it,
but rather adding it in case it was deemed necessary.

Do you have a concrete use-case (code) that you could share?

>Sometimes a factory method returns a Gatherer that only works correctly
if the upstream has certain characteristics, for example
Spliterator.SORTED or Spliterator.DISTINCT.

Do you have a concrete use-case (code) that you could share?

>One idea is to add methods
like Gatherers.sorted() and Gatherers.distinct(), where the Stream
implementation would be able to recognize and eliminate these from the
pipeline if the upstream already has these characteristics. That way
we'd be able to write `return Gatherers.sorted().andThen(…);`. Another
idea is to provide a Gatherer with a way to inspect the upstream
characteristics. If the upstream is missing the required
characteristic(s), it could then throw an IllegalStateException.

For a rather long time Gatherer had characteristics, however,
what I noticed is that given composition of Gatherers what ended up happening
almost always was that the combination of characteristics added overhead and devolved into the empty set
real fast.

Also, when it comes to things like sorted() and distinct(), they (by necessity) have to get processed in full
before emitting anything downstream, which creates a lot of extra memory allocation and doesn't lend themselves all that well to any depth-first streaming.

>The returns clause of Gatherer.Integrator::integrate just states "true
if subsequent integration is desired, false if not". In particular, it
doesn't document the behavior I'm observing, that returning false also
causes downstream to reject any further output elements.

Do you have a test case? (There was a bug fixed in this area after 22 was released, so you may want to test it on a 23-ea)




Cheers,
√


Viktor Klang
Software Architect, Java Platform Group
Oracle
________________________________
From: core-libs-dev <core-libs-dev-retn at openjdk.org> on behalf of Anthony Vanelverdinghe <dev at anthonyv.be>
Sent: Saturday, 27 July 2024 08:57
To: core-libs-dev at openjdk.org <core-libs-dev at openjdk.org>
Subject: Stream Gatherers (JEP 473) feedback

When writing factory methods for Gatherers, there's sometimes a
degenerate case that requires returning a no-op Gatherer. So I'd like a
way to mark a no-op Gatherer as such, allowing the Stream implementation
to recognize and eliminate it from the pipeline. One idea is to add
Gatherer.defaultIntegrator(), analogous to the other default… methods.
Another is to add Gatherers.identity(), analogous to Function.identity().

Sometimes a factory method returns a Gatherer that only works correctly
if the upstream has certain characteristics, for example
Spliterator.SORTED or Spliterator.DISTINCT. One idea is to add methods
like Gatherers.sorted() and Gatherers.distinct(), where the Stream
implementation would be able to recognize and eliminate these from the
pipeline if the upstream already has these characteristics. That way
we'd be able to write `return Gatherers.sorted().andThen(…);`. Another
idea is to provide a Gatherer with a way to inspect the upstream
characteristics. If the upstream is missing the required
characteristic(s), it could then throw an IllegalStateException.

The returns clause of Gatherer.Integrator::integrate just states "true
if subsequent integration is desired, false if not". In particular, it
doesn't document the behavior I'm observing, that returning false also
causes downstream to reject any further output elements.

In the Implementation Requirements section of Gatherer, rephrasing
"Outputs and state later in the input sequence will be discarded if
processing an earlier partition short-circuits." to something like the
following would be clearer to me: "As soon as any partition
short-circuits, the whole Gatherer short-circuits. The state of other
partitions is discarded, i.e. there are no further invocations of the
combiner. The finisher is invoked with the short-circuiting partition's
state." I wouldn't mention discarding of outputs, since that's implied
by the act of short-circuiting.

Anthony

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20240729/507e4faa/attachment-0001.htm>


More information about the core-libs-dev mailing list