RFR: JDK-8319123 : Implementation of JEP-461: Stream Gatherers (Preview)

Viktor Klang vklang at openjdk.org
Wed Nov 8 15:41:22 UTC 2023


On Sun, 5 Nov 2023 17:55:39 GMT, Tagir F. Valeev <tvaleev at openjdk.org> wrote:

>> This is a Draft PR for [JEP-461](https://openjdk.org/jeps/461)
>
> Very solid work, thank you! See some minor comments inline. I actually have much more ideas of specific gatherers, but they could be discussed separately.
> 
> One thing that bothers me is that Gatherer is similar to Collector, yet the APIs are totally different, so users may be confused. I like the Gatherer API much more and I see that evolving current Collector API might be hard, given tons of third-party implementations, so I don't see a good way to fix this. But probably we should provide Collector-to-Gatherer adapter (producing one-element stream)? And in general, it would be nice to use exactly-one-element-producing-gatherers as terminal operations, without explicit findFirst().get().

Thanks for the feedback @amaembo! I hope you find the updates reasonable :)

> src/java.base/share/classes/java/util/stream/AbstractPipeline.java line 191:
> 
>> 189:      * existing pipeline.
>> 190:      *
>> 191:      * The previous stage must be unlinked and unconsumed.
> 
> Note that linebreak in Javadoc is normally ignored. It's probably better to add an explicit `<p>`. This way it will be rendered better in IDE documentation view.

@amaembo That's true, but given that this is internal API which won't be a part of the JDK Javadoc I think it is safe to address this at a later point. 👍

> src/java.base/share/classes/java/util/stream/Gatherer.java line 99:
> 
>> 97:  * public static <T, R> Gatherer<T, ?, R> map(Function<? super T, ? extends R> mapper) {
>> 98:  *     return Gatherer.of(
>> 99:  *         (unused, element, downstream) -> // integrator
> 
> As [JEP 456](https://openjdk.org/jeps/456) is already integrated, probably it's reasonable to use `(_, element, downstream) ->` here?

I thought this over for a while, and given that most developers won't be used to that feature yet I think it is clearer for the first preview to keep `unused` here.

> src/java.base/share/classes/java/util/stream/Gatherer.java line 119:
> 
>> 117:  *
>> 118:  * AS an example, in order to create a gatherer to implement a sequential
>> 119:  * Prefix Scan as a Gatherer, it could be done the following way:
> 
> Do we need 'Prefix Scan' in title case?

The reason for capitalizing it was that it is a specific term with specific meaning. Capitalizing it makes it stand out and let people do their own research if they want/need to learn more about Prefix Scans.

> src/java.base/share/classes/java/util/stream/Gatherer.java line 308:
> 
>> 306:      */
>> 307:     static <T, R> Gatherer<T, Void, R> ofSequential(
>> 308:             Integrator<Void, T, R> integrator) {
> 
> Probably PECS signature `? super T, ? extends R` could be useful here?

I've thought about that, and since we don't use that for the other parameters (like `initializer`) I don't think it helps changing it here. And the reason this has little effect is that the code calling the factory typically controls the components/parameters/return type.

> src/java.base/share/classes/java/util/stream/Gatherers.java line 389:
> 
>> 387:      * @throws IllegalArgumentException when windowSize is less than 1
>> 388:      */
>> 389:     public static <TR> Gatherer<TR, ?, List<TR>> windowSliding(int windowSize) {
> 
> From my experience, sliding window with windowSize = 2 is the most important and deserves the explicit support (I have such an operation in my StreamEx library and it's used quite often). Please consider adding (name subject to discussions):
> 
> `public static <T, R> Gatherer<T, ?, R> pairMap(BiFunction<T, T, R> functionToApplyToTwoAdjacentElements)`. It could be more optimal, as you only need to store the previous element. Moreover, I believe one can implement a proper combiner to make it parallel-friendly. Could be added separately after this PR is merged. I can contribute it.

@amaembo pairMap should be straight-forward to implement. And regarding parallelization I don't think that will necessarily work since it would not be able to run incrementally (depth-first).

> src/java.base/share/classes/java/util/stream/Gatherers.java line 403:
> 
>> 401:                 // Integrator
>> 402:                 Integrator.ofGreedy((window, e, downstream) -> {
>> 403:                     window.addLast(e);
> 
> Too bad it will reject null elements in the current implementation. This should be documented, or better fixed. We can use ArrayList and keep track of the current start index, and export slides manually using arraycopy twice and  `listFromTrustedArrayNullsAllowed`. It's only a little bit trickier and should not be slower than the current implementation. I can write it if you don't want to do it by yourself :-)

Took a stab at reimplementing both windowFixed and windowSliding to allow null elements in the stream: https://github.com/openjdk/jdk/pull/16420/commits/57fea44ce71ae94107f3fb6d7f104d35839371d9

> src/java.base/share/classes/java/util/stream/Gatherers.java line 436:
> 
>> 434:      *     Stream.of(1,2,3,4,5,6,7,8,9)
>> 435:      *           .gather(
>> 436:      *               Gatherers.fold(() -> "", (string, number) -> string + number)
> 
> This particular sample is not great, as it's totally replaceable with reduce() (`map(String::valueOf).reduce(String::concat)`), so users may wonder why using fold instead. I can suggest another sample which I show everywhere:
> `anyList.stream().gather(Gatherers.fold(() -> 1, (acc, e) -> acc * 31 + Objects.hashCode(e))).findFirst().get();`
> (haven't tested, but the idea is like this)
> It computes the hashCode of the list, according to the list specification. Here, the accumulator is inherently non-associative, so if you replace it with `reduce()` and make the stream parallel, it will yield the wrong result. But `Gatherers.fold` will work!
> 
> If you don't like it, we can think up something else, but concatenation sample is certainly not good.

Perhaps the best thing here is to change such that the supplier supplies something non-zero like "String: "?

> src/java.base/share/classes/java/util/stream/Gatherers.java line 450:
> 
>> 448:      * @throws NullPointerException if any of the parameters are null
>> 449:      */
>> 450:     public static <T, R> Gatherer<T, ?, R> fold(
> 
> Should it be explicitly named as `foldLeft`? Because `foldRight` is also possible (though will require complete stream buffering). Also see [JDK-8133680](https://bugs.openjdk.org/browse/JDK-8133680) and [JDK-8292845](https://bugs.openjdk.org/browse/JDK-8292845) (probably should be linked to gatherer's issue, and read the comment about naming)

TBH I don't think `foldRight` makes much sense for potentially unbounded structures such as Stream. In the case you need it, it might be better to export it to a List and then reversing it.

> src/java.base/share/classes/java/util/stream/Gatherers.java line 482:
> 
>> 480:      * @throws NullPointerException if any of the parameters are null
>> 481:      */
>> 482:     public static <T, R> Gatherer<T, ?, R> scan(
> 
> Similarly, probably should be named `scanLeft`?

That's a good question, and here's my thinking—`scanRight` doesn't make any sense for a construct which supports unboundedness, so if we were discussing *Collections* I'd be more inclined to agree (but it is more likely that reversing the collection and then calling scanLeft would be better).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16420#issuecomment-1798197056
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1382925138
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1382927064
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383126657
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383135200
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383597640
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383603310
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383147855
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383605841
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383151321


More information about the core-libs-dev mailing list