RFR: JDK-8319123 : Implementation of JEP-461: Stream Gatherers (Preview)

Wed Nov 8 16:34:06 UTC 2023

On Mon, 6 Nov 2023 16:18:22 GMT, Viktor Klang <vklang at openjdk.org> wrote:

>> src/java.base/share/classes/java/util/stream/Gatherers.java line 389:
>> 
>>> 387:      * @throws IllegalArgumentException when windowSize is less than 1
>>> 388:      */
>>> 389:     public static <TR> Gatherer<TR, ?, List<TR>> windowSliding(int windowSize) {
>> 
>> From my experience, sliding window with windowSize = 2 is the most important and deserves the explicit support (I have such an operation in my StreamEx library and it's used quite often). Please consider adding (name subject to discussions):
>> 
>> `public static <T, R> Gatherer<T, ?, R> pairMap(BiFunction<T, T, R> functionToApplyToTwoAdjacentElements)`. It could be more optimal, as you only need to store the previous element. Moreover, I believe one can implement a proper combiner to make it parallel-friendly. Could be added separately after this PR is merged. I can contribute it.
>
> @amaembo pairMap should be straight-forward to implement. And regarding parallelization I don't think that will necessarily work since it would not be able to run incrementally (depth-first).

Sorry, I'm not sure what do you mean by incrementally. But I've realized that to implement this you may need to push to downstream during combining. E.g., imagine the stream of numbers 1...1000, and you want to make `parallel().gather(pairMap((left, right) -> left+"->"+right)).collect(toList())`, creating a list of "1->2", "2->3", ..., "999->1000" strings. Now, let's assume that the stream was split into four parts during the parallelization: `1..250`, `251..500`, `501..750`, `751..1000`. So you create downstream accumulators and push the pairs there with integrator, like:
"1->2", ..., "249->250" into first toList() accumulator, 
"251->252", ..., "499->500" into second toList() accumulator
"501->502", ..., "749->750" into third toList() accumulator
"751->752", ..., "999->1000" into fourth toList() accumulator
At this point we have four independent gatherer states, and they keep the first and last element from the upstream. Now, you combine for example, first and second gatherer states. The first one is [first=1, last=250], and the second is [first=251, last=500]. The combined gatherer should be [first=1, last=500] and at the same time during the combining it should push the "250->251" object to the downstream, which will land into the first toList() accumulator.

Hopefully my explanation is clear. Will it be possible to extend the combiner function and provide the Downstream there as well, so combiner can also work as a finisher for the first combined chunk?

>> src/java.base/share/classes/java/util/stream/Gatherers.java line 436:
>> 
>>> 434:      *     Stream.of(1,2,3,4,5,6,7,8,9)
>>> 435:      *           .gather(
>>> 436:      *               Gatherers.fold(() -> "", (string, number) -> string + number)
>> 
>> This particular sample is not great, as it's totally replaceable with reduce() (`map(String::valueOf).reduce(String::concat)`), so users may wonder why using fold instead. I can suggest another sample which I show everywhere:
>> `anyList.stream().gather(Gatherers.fold(() -> 1, (acc, e) -> acc * 31 + Objects.hashCode(e))).findFirst().get();`
>> (haven't tested, but the idea is like this)
>> It computes the hashCode of the list, according to the list specification. Here, the accumulator is inherently non-associative, so if you replace it with `reduce()` and make the stream parallel, it will yield the wrong result. But `Gatherers.fold` will work!
>> 
>> If you don't like it, we can think up something else, but concatenation sample is certainly not good.
>
> Perhaps the best thing here is to change such that the supplier supplies something non-zero like "String: "?

Such an example still looks artificial for me, as it's easy to rewrite with `reduce()` (just prepend with prefix outside of the stream). Well, it's up to you.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1386896007
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1386898587