RFR: JDK-8205461 Create Collector which merges results of two other collectors

Brian Goetz brian.goetz at oracle.com
Fri Sep 21 13:25:56 UTC 2018


The example of ISS is a good one.  It is analogous to the question of 
"when is it right to write a class, and when it is right to write a 
function?"  And the answer is, of course, "it depends."  ISS was an 
obvious grouping, but even there there was significant disagreement 
during its design about what it should support and not (especially with 
regard to sum-of-squares calculations), and extra work done to make it 
extensible.  If you're writing from scratch, you might well consider 
writing something like ISS.

But ... the whole motivation for having "teeing" _at all_ is that you 
have some existing collectors you want to reuse!  It seems a little 
silly to claim "I definitely will want to reuse two collectors, so much 
so that we need a new method, but can't imagine ever wanting to reuse 
three."

So, while I am not saying we have to solve the N-way problem now, but I 
think we'd be silly to pick a naming scheme that falls apart when we try 
to go past two.   So I'm still at "teeing".  It works for two, and it 
works for larger numbers as well.

On 9/16/2018 5:23 AM, Tagir Valeev wrote:
> Hello, Brian!
>
> Regarding more than two collectors. Some libraries definitely have
> analogs (e.g. [1]) which combine more than two collectors. To my
> opinion combining two collectors this way is an upper limit for
> readable code. Especially if you are going to collect to the list, you
> will have a list of untyped and unnamed results which positionally
> correspond to the collectors. If you have more than two collectors to
> combine, writing a separate accumulator class with accept/combine
> methods and creating a collector from the scratch would be much easier
> to read and support. A good example is IntSummaryStatistics and the
> corresponding summarizingInt collector. It could be emulated combining
> four collectors (maxBy, minBy, summingInt, counting), but having a
> dedicated class IntSummaryStatistics which does all four things
> explicitly is much better. It could be easily reused outside of Stream
> API context, it has well-named and well-typed accessor methods and it
> may contain other domain-specific methods like average(). Imagine if
> it were a List of four elements and you had to call summary.get(1) to
> get a maximum. So I think that supporting more than two collectors
> would encourage obscure programming.
>
> With best regards,
> Tagir Valeev
>
> [1] https://github.com/jOOQ/jOOL/blob/889d87c85ca57bafd4eddd78e0f7ae2804d2ee86/jOOL/src/main/java/org/jooq/lambda/tuple/Tuple.java#L1282
> (don't ask me why!)
>
> On Sat, Sep 15, 2018 at 10:36 PM Brian Goetz <brian.goetz at oracle.com> wrote:
>> tl;dr: "Duplexing" is an OK name, though I think `teeing` is less likely
>> to be a name we regret, for reasons outlined below.
>>
>>
>> The behavior of this Collector is:
>>    - duplicate the stream into two identical streams
>>    - collect the two streams with two collectors, yielding two results
>>    - merge the two results into a single result
>>
>> Obviously, a name like `duplexingAndCollectingAndThenMerging`, which,
>> entirely accurate and explanatory, is "a bit" unwieldy.  So the
>> questions are:
>>    - how much can we drop and still be accurate
>>    - which parts are best to drop.
>>
>> When we pick names, we are not just trying to pick the best name for
>> now, but we should imagine all the possible operations one might ever
>> want to do in the future (names in the JDK are forever) and make a
>> reasonable attempt to imagine whether this could cause confusion or
>> regret in the future.
>>
>> To evaluate "duplexing" here (which seems the most important thing to
>> keep), I'd ask: is there any other reasonable way to imagine a
>> `duplexing` collect operation, now or in the future?
>>
>> One could imagine wanting an operation that takes a stream and produces
>> two streams whose contents are that of the original stream.  And
>> "duplex" is a good name for that.  But, it is not a Collector; it would
>> be a stream transform, like concat.  So that doesn't seem a conflict; a
>> duplexing collector and a duplexing stream transform are sort of from
>> "different namespaces."
>>
>> Can one imagine a "duplexing" Collector that doesn't do any collection?
>> I cannot.  Something that returns a pair of streams would not be a
>> Collector, but something else. So dropping AndCollecting seems justified.
>>
>> What about "AndThenMerging"?  The purpose of collect is to reduce the
>> stream into a summary description.  Can we imagine a duplexing operation
>> that doesn't merge the two results, but instead just returns a tuple of
>> the results?  Yes, I can totally imagine this, especially once we have
>> value types and records, which makes returning ad-hoc tuples cheaper
>> (syntactically, heap-wise, CPU-wise.)  So I think this is quite a
>> reasonable possibility. But, I would have no problem with an overload
>> that didn't take a merger and returned a tuple of the result, and was
>> still called `duplexing`.
>>
>> So I'm fine with dropping all the extra AndThisAndThat.
>>
>> Finally, there's one other obvious direction we might extend this --
>> more than two collectors.  There's no reason why we can only do two; we
>> could take a (likely homogeneous) varargs of Collectors, and return a
>> List of results -- which itself could then be streamed into another
>> collector.  This actually sounds pretty useful (though I'm not
>> suggesting doing this right now.) And, I think it would be silly if this
>> were not called the same thing as the two-collector version (just as it
>> would be silly to have separate names for "concat two" and "concat n".)
>>
>> And, this is where I think "duplexing" runs out of gas -- duplex implies
>> "two".  Pedantic argue-for-the-sake-of-argument folks might observe that
>> "tee" also has bilateral symmetry, but I don't think you could
>> reasonably argue that a four-way "tee" is not less of an arity abuse
>> than a four-way "duplex", and the plumbing industry would agree:
>>
>> https://www.amazon.com/Way-Tee-PVC-Fitting-Furniture/dp/B017AO2WCM
>>
>> So, for these reasons, I still think "teeing" has a better balance of
>> being both evocative what it does and likely to stand the test of time.
>>
>>
>>
>>
>> On 9/14/2018 1:09 PM, Stuart Marks wrote:
>>> First, naming. I think "duplex" as the root word wins! Using
>>> "duplexing" to conform to many of other collectors is fine; so,
>>> "duplexing" is good.



More information about the core-libs-dev mailing list