RFR: JDK-8205461 Create Collector which merges results of two other collectors

Brian Goetz brian.goetz at oracle.com
Sat Sep 15 15:36:45 UTC 2018


tl;dr: "Duplexing" is an OK name, though I think `teeing` is less likely 
to be a name we regret, for reasons outlined below.


The behavior of this Collector is:
  - duplicate the stream into two identical streams
  - collect the two streams with two collectors, yielding two results
  - merge the two results into a single result

Obviously, a name like `duplexingAndCollectingAndThenMerging`, which, 
entirely accurate and explanatory, is "a bit" unwieldy.  So the 
questions are:
  - how much can we drop and still be accurate
  - which parts are best to drop.

When we pick names, we are not just trying to pick the best name for 
now, but we should imagine all the possible operations one might ever 
want to do in the future (names in the JDK are forever) and make a 
reasonable attempt to imagine whether this could cause confusion or 
regret in the future.

To evaluate "duplexing" here (which seems the most important thing to 
keep), I'd ask: is there any other reasonable way to imagine a 
`duplexing` collect operation, now or in the future?

One could imagine wanting an operation that takes a stream and produces 
two streams whose contents are that of the original stream.  And 
"duplex" is a good name for that.  But, it is not a Collector; it would 
be a stream transform, like concat.  So that doesn't seem a conflict; a 
duplexing collector and a duplexing stream transform are sort of from 
"different namespaces."

Can one imagine a "duplexing" Collector that doesn't do any collection?  
I cannot.  Something that returns a pair of streams would not be a 
Collector, but something else. So dropping AndCollecting seems justified.

What about "AndThenMerging"?  The purpose of collect is to reduce the 
stream into a summary description.  Can we imagine a duplexing operation 
that doesn't merge the two results, but instead just returns a tuple of 
the results?  Yes, I can totally imagine this, especially once we have 
value types and records, which makes returning ad-hoc tuples cheaper 
(syntactically, heap-wise, CPU-wise.)  So I think this is quite a 
reasonable possibility. But, I would have no problem with an overload 
that didn't take a merger and returned a tuple of the result, and was 
still called `duplexing`.

So I'm fine with dropping all the extra AndThisAndThat.

Finally, there's one other obvious direction we might extend this -- 
more than two collectors.  There's no reason why we can only do two; we 
could take a (likely homogeneous) varargs of Collectors, and return a 
List of results -- which itself could then be streamed into another 
collector.  This actually sounds pretty useful (though I'm not 
suggesting doing this right now.) And, I think it would be silly if this 
were not called the same thing as the two-collector version (just as it 
would be silly to have separate names for "concat two" and "concat n".)

And, this is where I think "duplexing" runs out of gas -- duplex implies 
"two".  Pedantic argue-for-the-sake-of-argument folks might observe that 
"tee" also has bilateral symmetry, but I don't think you could 
reasonably argue that a four-way "tee" is not less of an arity abuse 
than a four-way "duplex", and the plumbing industry would agree:

https://www.amazon.com/Way-Tee-PVC-Fitting-Furniture/dp/B017AO2WCM

So, for these reasons, I still think "teeing" has a better balance of 
being both evocative what it does and likely to stand the test of time.




On 9/14/2018 1:09 PM, Stuart Marks wrote:
>
> First, naming. I think "duplex" as the root word wins! Using 
> "duplexing" to conform to many of other collectors is fine; so, 
> "duplexing" is good. 



More information about the core-libs-dev mailing list