Collectors inventory
Remi Forax
forax at univ-mlv.fr
Mon Mar 4 10:27:12 PST 2013
On 03/04/2013 04:37 PM, Brian Goetz wrote:
>>> As I promised a long time ago, here's an overview of what's in
>>> Collectors currently.
>>
>> I think there are too many methods in Collectors, we should restrain
>> ourselves to 2 forms (3 max).
>
> Let me make sure I understand the rationale for such a rule.
>
> Having more forms has a clear advantage: the client code is simpler
> (e.g., free of extra noise like HashMap::new when the user doesn't
> care what Map he gets.)
Having to open and read the javadoc each time you want to use a
Collector or worst each time you read a code that uses a Collector is a
big disadvantage IMO. The whole Collector API has to fit into a humain
brain.
> And the implementations are trivial, so the implementation
> complexity is not an issue.
No, the issue is more to understand the difference between all the
overloads.
> Is the sole issue here the "OMG so many Collectors" reaction when
> the user goes to the Javadoc page for Collectors?
It's more OMG, I have to read a code that use a Collector ...
>
>>> There are 12 basic forms:
>>> - toCollection(ctor)
>>> - toList()
>>> - toSet()
>>> - toStringBuilder()
>>> - toStringJoiner(delimiter)
>>> - to{Long,Double}Statistics
>>>
>>> - groupingBy(classifier, mapFactory, downstream collector)
>>> - groupingReduce(classifier, mapFactory, mapper, reducer)
>>> - mapping(mappingFn, downstream collector)
>>> - joiningWith(mappingFunction, mergeFunction, mapFactory)
>>> - partitioningBy(predicate, downstream collector)
>>> - partitioningReduce(predicate, mapper, reducer)
>
> To be clear, has anyone objected to any of these basic forms, or are
> we only talking about the variants?
I am talking about variants.
>
>>> GroupingBy has four forms:
>>> - groupingBy(T->K) -- standard groupBy, values of resulting Map are
>>> Collection<T>
>>> - Same, but with explicit constructors for map and for rows (so you
>>> can produce, say, a TreeMap<K, TreeSet<T>> and not just a
>>> Map<K,Collection<T>>)
>>> - groupingBy(T->K, Collector<T,D>) -- multi-level groupBy, where
>>> downstream is another Collector
>>> - Same, but with explicit ctor for map
>>
>> You can remove the third one give, you have the one with an explicit
>> constructor.
>
> I think its a false economy to suggest removing this one. Think about
> the user code:
>
> collect(groupBy(Foo::first, groupBy(Foo::second)))
>
> is really clear. The extra map ctor:
>
> collect(groupBy(Foo::first, groupBy(Foo::second), HashMap::new))
>
> really feels like noise when reading the code -- all for the sake of
> removing a trivial overload? Also, for some collectors, we may want a
> specialized Map implementation, one that is, say, optimized for
> merging. (Partition, at this point, is basically groupBy with an
> optimized Map implementation.) In which case the explicit
> HashMap::new is a performance impediment.
If you have such Map, you should made it public, people will re-use it.
Now for groupBy of groupBy, it's a corner case, for a corner case, it's
usually better to be a little more verbose if you end with only one
form. Again, it's easier to read and easier to write.
>
> So, while I accept that removing the non-explicit-ctor versions could
> reduce the number of forms, I think its a false economy -- because the
> resulting user code is worse.
>
user code is better because there is less overload (or better one) that
can match.
maybe later, for jdk9 or jdk10, you can add more collectors if people
ask, but I think here it's important to be as simple as possible.
Rémi
More information about the lambda-libs-spec-experts
mailing list