Collectors inventory
Brian Goetz
brian.goetz at oracle.com
Mon Mar 4 07:37:55 PST 2013
>> As I promised a long time ago, here's an overview of what's in
>> Collectors currently.
>
> I think there are too many methods in Collectors, we should restrain
> ourselves to 2 forms (3 max).
Let me make sure I understand the rationale for such a rule.
Having more forms has a clear advantage: the client code is simpler
(e.g., free of extra noise like HashMap::new when the user doesn't care
what Map he gets.) And the implementations are trivial, so the
implementation complexity is not an issue. Is the sole issue here the
"OMG so many Collectors" reaction when the user goes to the Javadoc page
for Collectors?
>> There are 12 basic forms:
>> - toCollection(ctor)
>> - toList()
>> - toSet()
>> - toStringBuilder()
>> - toStringJoiner(delimiter)
>> - to{Long,Double}Statistics
>>
>> - groupingBy(classifier, mapFactory, downstream collector)
>> - groupingReduce(classifier, mapFactory, mapper, reducer)
>> - mapping(mappingFn, downstream collector)
>> - joiningWith(mappingFunction, mergeFunction, mapFactory)
>> - partitioningBy(predicate, downstream collector)
>> - partitioningReduce(predicate, mapper, reducer)
To be clear, has anyone objected to any of these basic forms, or are we
only talking about the variants?
>> GroupingBy has four forms:
>> - groupingBy(T->K) -- standard groupBy, values of resulting Map are
>> Collection<T>
>> - Same, but with explicit constructors for map and for rows (so you
>> can produce, say, a TreeMap<K, TreeSet<T>> and not just a
>> Map<K,Collection<T>>)
>> - groupingBy(T->K, Collector<T,D>) -- multi-level groupBy, where
>> downstream is another Collector
>> - Same, but with explicit ctor for map
>
> You can remove the third one give, you have the one with an explicit
> constructor.
I think its a false economy to suggest removing this one. Think about
the user code:
collect(groupBy(Foo::first, groupBy(Foo::second)))
is really clear. The extra map ctor:
collect(groupBy(Foo::first, groupBy(Foo::second), HashMap::new))
really feels like noise when reading the code -- all for the sake of
removing a trivial overload? Also, for some collectors, we may want a
specialized Map implementation, one that is, say, optimized for merging.
(Partition, at this point, is basically groupBy with an optimized Map
implementation.) In which case the explicit HashMap::new is a
performance impediment.
So, while I accept that removing the non-explicit-ctor versions could
reduce the number of forms, I think its a false economy -- because the
resulting user code is worse.
More information about the lambda-libs-spec-experts
mailing list