Collectors update redux

Brian Goetz brian.goetz at oracle.com
Thu Feb 7 11:12:51 PST 2013


>     Is three-arg collect really the target "on ramp"?

Sorry, I was probably not clear.  It is the onramp to the mutable part 
of the reduce functionality, but it builds on the more functional 
flavors, as outlined in the "digression" section.

> IF you've been successfully spoon-fed the excellent examples (bitset
> etc.) then you can see it as reasonably simple. Otherwise you're pretty
> lost in the woods.

I think that's fair.  Which points, as we've already agreed, to the fact 
that this is mostly a pedagogical problem.

>     I would have thought the first stop would be the combinators. OTOH
>     ... there's a lot of stuff in there.
>
> I think there is *way* too much stuff in there, and I don't have enough
> time to even review it all before it gets set in stone. I strongly
> believe we would be smarter to keep the set of prepackaged Collectors
> much smaller and let third-party libraries experiment with which
> Collectors to provide.

Conceptually, the set is pretty simple:

base collectors == toCollection, toStatistics, toStringBuilder, 
joinedWith (takes Stream<T> plus T->U, produces Map<T,U>)

combinator for map+collector
combinator for groupBy+collector
combinator for groupBy+reduce
combinator for partition+collector
combinator for partition+reduce

plus defaults for above where if you don't have a downstream collector, 
it assumes "toCollection" (e.g., the no-arg groupBy).

Individually, each of these is dead-simple both in concept and 
implementation (once you understand Collector) -- even the most complex 
are only 20 LoC, and many are are 1-2 LoC.  I think what creates the 
perception of complexity is the number of forms that jumps out at you on 
the Javadoc page?

The one place where we might consider reducing scope is by eliminating 
the forms that take an explicit Supplier<Map>.  In other words, you 
always get a HashMap / ConcurrentHashMap.  This cuts the number of 
groupBy/join forms in half.  But it leaves those who want, say, to group 
to a TreeMap out in the cold.

Do we feel that would be an improvement?

Alternately, we can refactor the Map-driven collectors so that instead 
of the Supplier<Map> being an argument, it can be a method on the Collector:

   collect(groupingBy(Txn::buyer).usingMap(TreeMap::new))

by having a ToMapCollector (extends Collector) with a usingMap() method. 
  This again gets us a nearly 2x reduction in number of methods in 
Collectors, at the cost of moving the "pick your own map" functionality 
to somewhere else.



More information about the lambda-libs-spec-experts mailing list