RFR: JDK-8205461 Create Collector which merges results of two other collectors
Stuart Marks
stuart.marks at oracle.com
Tue Sep 25 00:33:04 UTC 2018
Webrev looks good.
In the CSR, I updated the webrev link to point to the latest, I set the
fix-version to 12, and I set the scope to SE. I've marked the CSR reviewed.
The next thing is for you to mark the CSR as Finalized.
Thanks,
s'marks
On 9/24/18 3:39 AM, Tagir Valeev wrote:
> Ok, teeing. Webrev updated:
> http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r6/
> CSR updated accordingly:
> https://bugs.openjdk.java.net/browse/JDK-8209685
>
> With best regards,
> Tagir Valeev.
> On Fri, Sep 21, 2018 at 8:26 PM Brian Goetz <brian.goetz at oracle.com> wrote:
>>
>> The example of ISS is a good one. It is analogous to the question of
>> "when is it right to write a class, and when it is right to write a
>> function?" And the answer is, of course, "it depends." ISS was an
>> obvious grouping, but even there there was significant disagreement
>> during its design about what it should support and not (especially with
>> regard to sum-of-squares calculations), and extra work done to make it
>> extensible. If you're writing from scratch, you might well consider
>> writing something like ISS.
>>
>> But ... the whole motivation for having "teeing" _at all_ is that you
>> have some existing collectors you want to reuse! It seems a little
>> silly to claim "I definitely will want to reuse two collectors, so much
>> so that we need a new method, but can't imagine ever wanting to reuse
>> three."
>>
>> So, while I am not saying we have to solve the N-way problem now, but I
>> think we'd be silly to pick a naming scheme that falls apart when we try
>> to go past two. So I'm still at "teeing". It works for two, and it
>> works for larger numbers as well.
>>
>> On 9/16/2018 5:23 AM, Tagir Valeev wrote:
>>> Hello, Brian!
>>>
>>> Regarding more than two collectors. Some libraries definitely have
>>> analogs (e.g. [1]) which combine more than two collectors. To my
>>> opinion combining two collectors this way is an upper limit for
>>> readable code. Especially if you are going to collect to the list, you
>>> will have a list of untyped and unnamed results which positionally
>>> correspond to the collectors. If you have more than two collectors to
>>> combine, writing a separate accumulator class with accept/combine
>>> methods and creating a collector from the scratch would be much easier
>>> to read and support. A good example is IntSummaryStatistics and the
>>> corresponding summarizingInt collector. It could be emulated combining
>>> four collectors (maxBy, minBy, summingInt, counting), but having a
>>> dedicated class IntSummaryStatistics which does all four things
>>> explicitly is much better. It could be easily reused outside of Stream
>>> API context, it has well-named and well-typed accessor methods and it
>>> may contain other domain-specific methods like average(). Imagine if
>>> it were a List of four elements and you had to call summary.get(1) to
>>> get a maximum. So I think that supporting more than two collectors
>>> would encourage obscure programming.
>>>
>>> With best regards,
>>> Tagir Valeev
>>>
>>> [1] https://github.com/jOOQ/jOOL/blob/889d87c85ca57bafd4eddd78e0f7ae2804d2ee86/jOOL/src/main/java/org/jooq/lambda/tuple/Tuple.java#L1282
>>> (don't ask me why!)
>>>
>>> On Sat, Sep 15, 2018 at 10:36 PM Brian Goetz <brian.goetz at oracle.com> wrote:
>>>> tl;dr: "Duplexing" is an OK name, though I think `teeing` is less likely
>>>> to be a name we regret, for reasons outlined below.
>>>>
>>>>
>>>> The behavior of this Collector is:
>>>> - duplicate the stream into two identical streams
>>>> - collect the two streams with two collectors, yielding two results
>>>> - merge the two results into a single result
>>>>
>>>> Obviously, a name like `duplexingAndCollectingAndThenMerging`, which,
>>>> entirely accurate and explanatory, is "a bit" unwieldy. So the
>>>> questions are:
>>>> - how much can we drop and still be accurate
>>>> - which parts are best to drop.
>>>>
>>>> When we pick names, we are not just trying to pick the best name for
>>>> now, but we should imagine all the possible operations one might ever
>>>> want to do in the future (names in the JDK are forever) and make a
>>>> reasonable attempt to imagine whether this could cause confusion or
>>>> regret in the future.
>>>>
>>>> To evaluate "duplexing" here (which seems the most important thing to
>>>> keep), I'd ask: is there any other reasonable way to imagine a
>>>> `duplexing` collect operation, now or in the future?
>>>>
>>>> One could imagine wanting an operation that takes a stream and produces
>>>> two streams whose contents are that of the original stream. And
>>>> "duplex" is a good name for that. But, it is not a Collector; it would
>>>> be a stream transform, like concat. So that doesn't seem a conflict; a
>>>> duplexing collector and a duplexing stream transform are sort of from
>>>> "different namespaces."
>>>>
>>>> Can one imagine a "duplexing" Collector that doesn't do any collection?
>>>> I cannot. Something that returns a pair of streams would not be a
>>>> Collector, but something else. So dropping AndCollecting seems justified.
>>>>
>>>> What about "AndThenMerging"? The purpose of collect is to reduce the
>>>> stream into a summary description. Can we imagine a duplexing operation
>>>> that doesn't merge the two results, but instead just returns a tuple of
>>>> the results? Yes, I can totally imagine this, especially once we have
>>>> value types and records, which makes returning ad-hoc tuples cheaper
>>>> (syntactically, heap-wise, CPU-wise.) So I think this is quite a
>>>> reasonable possibility. But, I would have no problem with an overload
>>>> that didn't take a merger and returned a tuple of the result, and was
>>>> still called `duplexing`.
>>>>
>>>> So I'm fine with dropping all the extra AndThisAndThat.
>>>>
>>>> Finally, there's one other obvious direction we might extend this --
>>>> more than two collectors. There's no reason why we can only do two; we
>>>> could take a (likely homogeneous) varargs of Collectors, and return a
>>>> List of results -- which itself could then be streamed into another
>>>> collector. This actually sounds pretty useful (though I'm not
>>>> suggesting doing this right now.) And, I think it would be silly if this
>>>> were not called the same thing as the two-collector version (just as it
>>>> would be silly to have separate names for "concat two" and "concat n".)
>>>>
>>>> And, this is where I think "duplexing" runs out of gas -- duplex implies
>>>> "two". Pedantic argue-for-the-sake-of-argument folks might observe that
>>>> "tee" also has bilateral symmetry, but I don't think you could
>>>> reasonably argue that a four-way "tee" is not less of an arity abuse
>>>> than a four-way "duplex", and the plumbing industry would agree:
>>>>
>>>> https://www.amazon.com/Way-Tee-PVC-Fitting-Furniture/dp/B017AO2WCM
>>>>
>>>> So, for these reasons, I still think "teeing" has a better balance of
>>>> being both evocative what it does and likely to stand the test of time.
>>>>
>>>>
>>>>
>>>>
>>>> On 9/14/2018 1:09 PM, Stuart Marks wrote:
>>>>> First, naming. I think "duplex" as the root word wins! Using
>>>>> "duplexing" to conform to many of other collectors is fine; so,
>>>>> "duplexing" is good.
>>
More information about the core-libs-dev
mailing list