Collectors inventory

Remi Forax forax at univ-mlv.fr
Mon Mar 4 10:27:12 PST 2013


On 03/04/2013 04:37 PM, Brian Goetz wrote:
>>> As I promised a long time ago, here's an overview of what's in
>>> Collectors currently.
>>
>> I think there are too many methods in Collectors, we should restrain
>> ourselves to 2 forms (3 max).
>
> Let me make sure I understand the rationale for such a rule.
>
> Having more forms has a clear advantage: the client code is simpler 
> (e.g., free of extra noise like HashMap::new when the user doesn't 
> care what Map he gets.)

Having to open and read the javadoc each time you want to use a 
Collector or worst each time you read a code that uses a Collector is a 
big disadvantage IMO. The whole Collector API has to fit into a humain 
brain.

>   And the implementations are trivial, so the implementation 
> complexity is not an issue.

No, the issue is more to understand the difference between all the 
overloads.

>   Is the sole issue here the "OMG so many Collectors" reaction when 
> the user goes to the Javadoc page for Collectors?

It's more OMG, I have to read a code that use a Collector ...

>
>>> There are 12 basic forms:
>>>  - toCollection(ctor)
>>>  - toList()
>>>  - toSet()
>>>  - toStringBuilder()
>>>  - toStringJoiner(delimiter)
>>>  - to{Long,Double}Statistics
>>>
>>>  - groupingBy(classifier, mapFactory, downstream collector)
>>>  - groupingReduce(classifier, mapFactory, mapper, reducer)
>>>  - mapping(mappingFn, downstream collector)
>>>  - joiningWith(mappingFunction, mergeFunction, mapFactory)
>>>  - partitioningBy(predicate, downstream collector)
>>>  - partitioningReduce(predicate, mapper, reducer)
>
> To be clear, has anyone objected to any of these basic forms, or are 
> we only talking about the variants?

I am talking about variants.

>
>>> GroupingBy has four forms:
>>>  - groupingBy(T->K) -- standard groupBy, values of resulting Map are
>>> Collection<T>
>>>  - Same, but with explicit constructors for map and for rows (so you
>>> can produce, say, a TreeMap<K, TreeSet<T>> and not just a
>>> Map<K,Collection<T>>)
>>>  - groupingBy(T->K, Collector<T,D>) -- multi-level groupBy, where
>>> downstream is another Collector
>>>  - Same, but with explicit ctor for map
>>
>> You can remove the third one give, you have the one with an explicit
>> constructor.
>
> I think its a false economy to suggest removing this one.  Think about 
> the user code:
>
>   collect(groupBy(Foo::first, groupBy(Foo::second)))
>
> is really clear.  The extra map ctor:
>
>   collect(groupBy(Foo::first, groupBy(Foo::second), HashMap::new))
>
> really feels like noise when reading the code -- all for the sake of 
> removing a trivial overload?  Also, for some collectors, we may want a 
> specialized Map implementation, one that is, say, optimized for 
> merging.  (Partition, at this point, is basically groupBy with an 
> optimized Map implementation.)  In which case the explicit 
> HashMap::new is a performance impediment.

If you have such Map, you should made it public, people will re-use it.

Now for groupBy of groupBy, it's a corner case, for a corner case, it's 
usually better to be a little more verbose if you end with only one 
form. Again, it's easier to read and easier to write.

>
> So, while I accept that removing the non-explicit-ctor versions could 
> reduce the number of forms, I think its a false economy -- because the 
> resulting user code is worse.
>

user code is better because there is less overload (or better one) that 
can match.

maybe later, for jdk9 or jdk10, you can add more collectors if people 
ask, but I think here it's important to be as simple as possible.

Rémi



More information about the lambda-libs-spec-experts mailing list