<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hi Cay!<br>
<br>
>AFAIK, the "greedy/short-circuiting" decision point doesn't have a major impact on performance either. Or am I wrong there?</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div id="Signature" class="elementToProof" style="color: inherit;">
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
One of my personal goals was to arrive at a point where the built-in intermediate Stream operations could be implemented as Gatherers with a performance around-the-same-or-better. Greedy allows for less signal tracking which in aggregate over several operations
can have a noticeable advantage. Keep in mind that this is primarily around sequential performance, which in my experience is vastly more common than parallel streams.<br>
<br>
>If I use the factory methods, I have to make a choice between of/ofGreedy<br>
<br>
Back when GREEDY was a Characteristic, you'd still have the choice of adding that or not, and since it was served on the side, it'd be much more likely that developers wouldn't know that was a choice they could make, and wouldn't be connected to the Integrator
itself. Making of() the short-name is because it is more capable than ofGreedy() (which does not support short-circuiting). So using it signals a choice of less capability.</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
>and ofSequential/of.</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
The reason for ofSequential/of instead of ofParallel/of is that a parallel-capable Gatherer covers more scenarios than a sequential-only versison, so giving the shorter name to the less-capable version would not stand out in the same way when reviewing, and
Collector.of is parallel-capable, which seemed more consistent to make Gatherer.of parallel-capable as well.<br>
<br>
>First off, I think factory methods should be the favored approach.</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Is there something which could be made clearer around that, because that's how I already view it.</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
>I have seen some gatherer implementations that don't use the factory methods, even though they could have. Is this flexibility useful? </div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Yes, being able to move from an in-line definition, to a class-local implementation, to a dedicated file implementation provides a gradual path based on user needs and operation evolution. That's the same as for Collector.<br>
<br>
>The details are fussy, with the marker interface and the magic default combiner.</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I think this is worth expanding a bit on, in what way are they fussy?</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
>That's where I thought the characteristics approach is a better API. It has precedence, and it is unfussy.</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
See the archives for the discussion around why Characteristics (which were there for a long time) didn't make it.<br>
<br>
>I realize it's not a big deal, and I was going to let it slide. Until I heard Brian's Devoxx talk where he mentioned "peak complexity", and I felt, that's, in a small way, what is present here.</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I completely understand your line of thinking, and I'm not sure what'll convince you that we're quite far from the peak (you should have seen some of my prototypes! 😄). It is important to remember that there's a pretty vast space of use-cases which needs to
be supported, so designing an abstraction which makes sense for a 2-element sequential stream as well as a 1m-element parallel stream, and everything in between had proven to be elusive ever since Streams were introduced in Java 8.</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I think it is also worth noting that <b>usage</b> of Gatherers will by far outnumber
<b>definition</b> of Gatherers, so appropriate emphasis needs to be placed on usage ergonomics, capabilities, and runtime performance.</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Cheers,<br>
√</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<b><br>
</b></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<b>Viktor Klang</b></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Software Architect, Java Platform Group<br>
Oracle</div>
</div>
<div id="appendonsend" style="color: inherit;"></div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<hr style="display: inline-block; width: 98%;">
<div id="divRplyFwdMsg" dir="ltr" style="color: inherit;"><span style="font-family: Calibri, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);"><b>From:</b> core-libs-dev <core-libs-dev-retn@openjdk.org> on behalf of Cay Horstmann <cay.horstmann@gmail.com><br>
<b>Sent:</b> Monday, 14 October 2024 21:27<br>
<b>To:</b> core-libs-dev@openjdk.org <core-libs-dev@openjdk.org><br>
<b>Subject:</b> Re: Fw: New candidate JEP: 485: Stream Gatherers</span>
<div> </div>
</div>
<div style="font-size: 11pt;">Hi Viktor,<br>
<br>
thanks for your clarifications.<br>
<br>
I agree that from a performance point of view, there isn't all that much to be gained. I thought more about parallelizing distinctBy and windowSliding, Perhaps one can squeeze out a modest gain, but I am not excited by the potential.<br>
<br>
AFAIK, the "greedy/short-circuiting" decision point doesn't have a major impact on performance either. Or am I wrong there?<br>
<br>
In my mind, given that performance is not worth more than maybe a tweak, this amplifies my first issue with the surface API.<br>
<br>
I started out thinking that almost nobody is going to write a gatherer, so why worry? But I found myself writing a couple in the last few days. And I wonder whether the current API is at "peak complexity".<br>
<br>
If I use the factory methods, I have to make a choice between of/ofGreedy and ofSequential/of.<br>
<br>
And if I don't use the factory methods, I have to mess with a marker interface or a method yielding a magic default.<br>
<br>
Is there some virtuous collapse?<br>
<br>
First off, I think factory methods should be the favored approach. And "of" should be the safe choice. That's why I would rename ofSequential into of, and introduce ofParallel for optimization. Like with of/ofGreedy.<br>
<br>
I have seen some gatherer implementations that don't use the factory methods, even though they could have. Is this flexibility useful? The details are fussy, with the marker interface and the magic default combiner. That's where I thought the characteristics
approach is a better API. It has precedence, and it is unfussy.<br>
<br>
I realize it's not a big deal, and I was going to let it slide. Until I heard Brian's Devoxx talk where he mentioned "peak complexity", and I felt, that's, in a small way, what is present here.<br>
<br>
Cheers,<br>
<br>
Cay<br>
<br>
--<br>
<br>
(Moving this to core-libs-dev)<br>
<br>
Cay,<br>
<br>
Regarding 1, Characteristics was a part of the Gatherers-contract for a very long time, alas it didn't end up worth its cost. There's a longer discussion on the topic here:
<a href="https://mail.openjdk.org/pipermail/core-libs-dev/2024-January/118138.html" id="OWA795dc066-ba46-6521-c1cf-41bfff52c11a" class="OWAAutoLink" data-auth="NotApplicable">
https://mail.openjdk.org/pipermail/core-libs-dev/2024-January/118138.html</a> (and I'm sure that there were more, but that's the one which comes to mind)<br>
<br>
Regarding 2, I did have a prototype which had a Downstream in the Combiner, but created a new dimension of complexity which made it even harder for people to move from sequential to parallelizable. The door isn't closed on it, but I remain unconvinced it's
worth the surface area for performance reasons.<br>
<br>
As a bit of a side-note, it's worth knowing that in the reference stream implementation, it is not unusual that parallel-capable stages are executed as "islands" which means that short-circuiting signals cannot travel across those islands. Since parallel-capable
Gatherers can be fused to execute in the same "island" if we get to a place where "all" intermediate operations are parallel-capable Gatherers, there'd be a single end-to-end "island" and hence the ability to propagate the short-circuiting would be preserved
in all modes of execution. Also worth knowing that a `gather(…)` immediately followed by a `collect(…)` can also be fused to run together.<br>
<br>
Cheers,<br>
√<br>
<br>
<br>
Viktor Klang<br>
Software Architect, Java Platform Group<br>
Oracle<br>
<br>
________________________________<br>
From: jdk-dev <jdk-dev-retn at openjdk.org> on behalf of Cay Horstmann <cay.horstmann at gmail.com><br>
Sent: Friday, 4 October 2024 19:58<br>
To: jdk-dev at openjdk.org <jdk-dev at openjdk.org><br>
Subject: Re: New candidate JEP: 485: Stream Gatherers<br>
<br>
Hi, I have some belated questions about the design choices in this API that I could not find addressed in the JEP.<br>
<br>
1. Why aren't characteristics used to express "greediness/short-circuiting" or "sequentialness/parallelizability"?<br>
<br>
I understand that for the former I use ofGreedy/of, or implement Gatherers.Integrator.Greedy/Gatherers.Integrator. And for the latter, I use ofSequential/of, or, if I implement the Gatherer interface, have the combiner return defaultCombiner() or not.<br>
<br>
But it seems a bit complex and less familiar than the characteristics mechanism that exists for spliterators, streams, and collectors.<br>
<br>
The original design document (<a href="https://cr.openjdk.org/~vklang/Gatherers.html" id="OWA3ba589d4-76b5-dfc3-1661-8e4334217957" class="OWAAutoLink" data-auth="NotApplicable">https://cr.openjdk.org/~vklang/Gatherers.html</a>) used characteristics, so I wonder
what motivated the change.<br>
<br>
2. Why wasn't the combiner() designed to allow pushing of elements to the end of the first range's sink? Then distinctBy could be parallelized without buffering the elements. More generally, with some state fiddling one can then handle the elements around range
splits.<br>
<br>
As it is, I don't see how to parallelize such computations other than to buffer all elements.<br>
<br>
I looked at the project at <a href="https://github.com/jhspetersson/packrat" id="OWA96f3582f-1a25-402e-83a3-3dc3140b1718" class="OWAAutoLink" data-auth="NotApplicable">
https://github.com/jhspetersson/packrat</a> that implements a number of gatherers. Only one uses a combiner, to join buffers until their contents can be pushed in the finisher.<br>
<br>
Cheers,<br>
<br>
Cay<br>
--<br>
<br>
Cay S. Horstmann | <a href="https://horstmann.com" id="OWAb6bedce0-1fd8-54ad-f35d-9421c8b9508b" class="OWAAutoLink" data-auth="NotApplicable">
https://horstmann.com</a><br>
<br>
</div>
</body>
</html>