RFR: JDK-8205461 Create Collector which merges results of two other collectors

Tomasz Linkowski t.linkowski at gmail.com
Sun Sep 16 20:12:15 UTC 2018


I agree with Tagir that supporting more than two Collectors sounds risky. I
especially agree that well-typed and well-named accessors are important.

I use the quoted library (jOOL), but I:
- either avoid all those tuple-based functions,
- or I use only Tuple2/Tuple3 and I map the tuple to a dedicated result
type immediately (with Collectors.collectingAndThen) so that I get
well-named accessors.

Note that if you need to combine more than two (generally, N) collectors,
you can just call duplexing() N-1 times and use intermediate result
holders, like I did for N=3 in [1]. It may be a bit of boilerplate, but the
only *other* way to do it without tuples in a well-typed manner for N=3 would
be to introduce a new functional interface like TriFunction<T,U,V,R> as a
merger.

That said, I found Brian's line of reasoning about dropping name parts very
convincing, and I really liked the analogy to a 4-way tee in plumbing.

Finally, here's a summary of the characteristics of the possible results
types for n-ary *heterogeneous* Collector composition:
- List<?> => well-typed: NO, well-named: NO
- n-ary tuple => well-typed: YES, well-named: NO
- custom result holder => well-typed: YES, well-named: YES

Personally, I don't find n-ary *homogeneous* Collector composition that
much useful, but if it were to be added, I agree List<T> would be the best
result type.

Regards,
Tomasz Linkowski

[1] https://stackoverflow.com/a/52211175/2032415


On Sun, Sep 16, 2018 at 11:23 AM, Tagir Valeev <amaembo at gmail.com> wrote:

> Hello, Brian!
>
> Regarding more than two collectors. Some libraries definitely have
> analogs (e.g. [1]) which combine more than two collectors. To my
> opinion combining two collectors this way is an upper limit for
> readable code. Especially if you are going to collect to the list, you
> will have a list of untyped and unnamed results which positionally
> correspond to the collectors. If you have more than two collectors to
> combine, writing a separate accumulator class with accept/combine
> methods and creating a collector from the scratch would be much easier
> to read and support. A good example is IntSummaryStatistics and the
> corresponding summarizingInt collector. It could be emulated combining
> four collectors (maxBy, minBy, summingInt, counting), but having a
> dedicated class IntSummaryStatistics which does all four things
> explicitly is much better. It could be easily reused outside of Stream
> API context, it has well-named and well-typed accessor methods and it
> may contain other domain-specific methods like average(). Imagine if
> it were a List of four elements and you had to call summary.get(1) to
> get a maximum. So I think that supporting more than two collectors
> would encourage obscure programming.
>
> With best regards,
> Tagir Valeev
>
> [1] https://github.com/jOOQ/jOOL/blob/889d87c85ca57bafd4eddd78e0f7ae
> 2804d2ee86/jOOL/src/main/java/org/jooq/lambda/tuple/Tuple.java#L1282
> (don't ask me why!)
>
> On Sat, Sep 15, 2018 at 10:36 PM Brian Goetz <brian.goetz at oracle.com>
> wrote:
> >
> > tl;dr: "Duplexing" is an OK name, though I think `teeing` is less likely
> > to be a name we regret, for reasons outlined below.
> >
> >
> > The behavior of this Collector is:
> >   - duplicate the stream into two identical streams
> >   - collect the two streams with two collectors, yielding two results
> >   - merge the two results into a single result
> >
> > Obviously, a name like `duplexingAndCollectingAndThenMerging`, which,
> > entirely accurate and explanatory, is "a bit" unwieldy.  So the
> > questions are:
> >   - how much can we drop and still be accurate
> >   - which parts are best to drop.
> >
> > When we pick names, we are not just trying to pick the best name for
> > now, but we should imagine all the possible operations one might ever
> > want to do in the future (names in the JDK are forever) and make a
> > reasonable attempt to imagine whether this could cause confusion or
> > regret in the future.
> >
> > To evaluate "duplexing" here (which seems the most important thing to
> > keep), I'd ask: is there any other reasonable way to imagine a
> > `duplexing` collect operation, now or in the future?
> >
> > One could imagine wanting an operation that takes a stream and produces
> > two streams whose contents are that of the original stream.  And
> > "duplex" is a good name for that.  But, it is not a Collector; it would
> > be a stream transform, like concat.  So that doesn't seem a conflict; a
> > duplexing collector and a duplexing stream transform are sort of from
> > "different namespaces."
> >
> > Can one imagine a "duplexing" Collector that doesn't do any collection?
> > I cannot.  Something that returns a pair of streams would not be a
> > Collector, but something else. So dropping AndCollecting seems justified.
> >
> > What about "AndThenMerging"?  The purpose of collect is to reduce the
> > stream into a summary description.  Can we imagine a duplexing operation
> > that doesn't merge the two results, but instead just returns a tuple of
> > the results?  Yes, I can totally imagine this, especially once we have
> > value types and records, which makes returning ad-hoc tuples cheaper
> > (syntactically, heap-wise, CPU-wise.)  So I think this is quite a
> > reasonable possibility. But, I would have no problem with an overload
> > that didn't take a merger and returned a tuple of the result, and was
> > still called `duplexing`.
> >
> > So I'm fine with dropping all the extra AndThisAndThat.
> >
> > Finally, there's one other obvious direction we might extend this --
> > more than two collectors.  There's no reason why we can only do two; we
> > could take a (likely homogeneous) varargs of Collectors, and return a
> > List of results -- which itself could then be streamed into another
> > collector.  This actually sounds pretty useful (though I'm not
> > suggesting doing this right now.) And, I think it would be silly if this
> > were not called the same thing as the two-collector version (just as it
> > would be silly to have separate names for "concat two" and "concat n".)
> >
> > And, this is where I think "duplexing" runs out of gas -- duplex implies
> > "two".  Pedantic argue-for-the-sake-of-argument folks might observe that
> > "tee" also has bilateral symmetry, but I don't think you could
> > reasonably argue that a four-way "tee" is not less of an arity abuse
> > than a four-way "duplex", and the plumbing industry would agree:
> >
> > https://www.amazon.com/Way-Tee-PVC-Fitting-Furniture/dp/B017AO2WCM
> >
> > So, for these reasons, I still think "teeing" has a better balance of
> > being both evocative what it does and likely to stand the test of time.
> >
> >
> >
> >
> > On 9/14/2018 1:09 PM, Stuart Marks wrote:
> > >
> > > First, naming. I think "duplex" as the root word wins! Using
> > > "duplexing" to conform to many of other collectors is fine; so,
> > > "duplexing" is good.
> >
>



-- 
Pozdrawiam,
Tomasz Linkowski


More information about the core-libs-dev mailing list