Collectors inventory

Brian Goetz brian.goetz at oracle.com
Mon Mar 4 07:37:55 PST 2013


>> As I promised a long time ago, here's an overview of what's in
>> Collectors currently.
>
> I think there are too many methods in Collectors, we should restrain
> ourselves to 2 forms (3 max).

Let me make sure I understand the rationale for such a rule.

Having more forms has a clear advantage: the client code is simpler 
(e.g., free of extra noise like HashMap::new when the user doesn't care 
what Map he gets.)  And the implementations are trivial, so the 
implementation complexity is not an issue.  Is the sole issue here the 
"OMG so many Collectors" reaction when the user goes to the Javadoc page 
for Collectors?

>> There are 12 basic forms:
>>  - toCollection(ctor)
>>  - toList()
>>  - toSet()
>>  - toStringBuilder()
>>  - toStringJoiner(delimiter)
>>  - to{Long,Double}Statistics
>>
>>  - groupingBy(classifier, mapFactory, downstream collector)
>>  - groupingReduce(classifier, mapFactory, mapper, reducer)
>>  - mapping(mappingFn, downstream collector)
>>  - joiningWith(mappingFunction, mergeFunction, mapFactory)
>>  - partitioningBy(predicate, downstream collector)
>>  - partitioningReduce(predicate, mapper, reducer)

To be clear, has anyone objected to any of these basic forms, or are we 
only talking about the variants?

>> GroupingBy has four forms:
>>  - groupingBy(T->K) -- standard groupBy, values of resulting Map are
>> Collection<T>
>>  - Same, but with explicit constructors for map and for rows (so you
>> can produce, say, a TreeMap<K, TreeSet<T>> and not just a
>> Map<K,Collection<T>>)
>>  - groupingBy(T->K, Collector<T,D>) -- multi-level groupBy, where
>> downstream is another Collector
>>  - Same, but with explicit ctor for map
>
> You can remove the third one give, you have the one with an explicit
> constructor.

I think its a false economy to suggest removing this one.  Think about 
the user code:

   collect(groupBy(Foo::first, groupBy(Foo::second)))

is really clear.  The extra map ctor:

   collect(groupBy(Foo::first, groupBy(Foo::second), HashMap::new))

really feels like noise when reading the code -- all for the sake of 
removing a trivial overload?  Also, for some collectors, we may want a 
specialized Map implementation, one that is, say, optimized for merging. 
  (Partition, at this point, is basically groupBy with an optimized Map 
implementation.)  In which case the explicit HashMap::new is a 
performance impediment.

So, while I accept that removing the non-explicit-ctor versions could 
reduce the number of forms, I think its a false economy -- because the 
resulting user code is worse.



More information about the lambda-libs-spec-experts mailing list