Collectors inventory
Brian Goetz
brian.goetz at oracle.com
Sun Mar 10 14:20:14 PDT 2013
OK, I've revamped Collectors in a way that may avoid the overload that
Kevin, Remi, and Joe were concerned about. At the same time, I've
integrated concurrent collection into the model in a more obvious way.
The key problem is grouping-by. There are essentially sixteen forms of
groupingBy:
{ concurrent, not }
x { with explicit Map constructors, not }
x { simple group-by, cascaded group-by (downstream collector),
simple reduce, map-reduce }
Its pretty hard to argue than any of these dimensions can be obviously
jettisoned. And simply pruning around the edges (e.g., "get rid of this
variant") doesn't do the job. Nor does "only provide the most general
form", which guarantees that no one will be able to use it at all.
With the help of Don and his team last week, I came up with an alternate
framing for groupingBy (and also partitioningBy, which has the same
problems). The key is to introduce an additional type, call it
GroupingCollector, off of which we can hang some of the variants, and
this lets us reduce the number of top-level collectors.
The current inventory, under this scheme (which I'll check in soon) is:
- to{Collection,List,Set}
- toString{Builder,Joiner}
- to{Int,Long,Double}Statistics
- toMap(mappingFn) // was mappedTo
- toMap(mappingFn, mapCtor)
- toConcurrentMap(mappingFn) // was ConcurrentCollectors.mappedTo
- toConcurrentMap(mappingFn, mapCtor)
- mapping(mappingFn, downstreamCollector) // plus primitive forms
- groupingBy(classifierFn)
- groupingBy(classifierFn, mapCtor)
- groupingByConcurrent(classifierFn)
- groupingByConcurrent(classifierFn, mapCtor)
- partitioningBy(predicate)
- partitioningByConcurrent(predicate)
This is a significant reduction in top-level forms -- we drop from 16
groupingXxx forms to four, a similar reduction for partitioning forms,
and -- most importantly ConcurrentCollectors *just goes away*.
Where it moves to is that the return type of groupingBy gets more
complicated. Instead of returning a simple Collector, it returns a
GroupingCollector. In its current form, GroupingCollector implements
Collector -- meaning you can use groupingBy(f) as a plain collector --
but the more advanced forms (cascading, reducing) are hanging as extra
methods off the GroupingCollector.
For example:
// Simple form -- people by city
Map<City, Collection<Person>> m
= people.stream().collect(groupingBy(Person::getCity));
// Two-level form -- people by state, city
// Uses .then(otherCollector) method
Map<State, Map<City, Collection<Person>>> m
= people.stream()
.collect(groupingBy(Person::getState)
.then(groupingBy(Person::getCity)));
// Reducing form -- count of people by city
// Uses .thenReducing(mapper, reducer) method
Map<City, Integer> m
= people.stream()
.collect(groupingBy(Person::getState)
.thenReducing(p -> 1, Integer::sum));
The methods that appear on GroupingCollector are:
.then(Collector downstream) -- cascaded groupBy
.thenReducing(BinaryOperator<T>) -- reduce
.thenReducing(Function<T,U>, BinaryOperator<U>) -- map/reduce
Partitioning is similar except the thenReducing methods need an identity
argument too.
public static interface GroupingCollector<T, K>
extends Collector<T, Map<K, Collection<T>>> {
<D> Collector<T, Map<K, D>> then(Collector<T, D> downstream);
Collector<T, Map<K, T>> thenReducing(BinaryOperator<T> reducer);
<U> Collector<T, Map<K, U>> thenReducing(Function<? super T, ?
extends U> mapper,
BinaryOperator<U> reducer);
}
}
The slightly weird thing about this is that a GroupingCollector is both
a Collector (for the simple form) and a factory for collectors (for the
cascaded forms). This makes the user code better (a simple group by is
just collecting(groupingBy(f))), but makes the type harder to
understand. We can adjust this tradeoff by severing the "extends
Collector" and adding another method for "get me a simple collector",
but I'm not sure this is an improvement. This would probably look like:
groupingBy(fn).toList()
or some such.
One variant we did jettison is the one where you provide an explicit
Collection ctor, so you could group into a Set<Person> instead of a
List<Person>. (You can still get this with
groupingBy(f).then(toCollection(ctor)). If we did the above
transformation, this could come back as:
groupingBy(fn).toCollection(ctor)
or some such.
Overall this seems a much more approachable set of Collectors. Still a
few fine details to work out, including:
- Does "GroupingCollector extends Collector" simplify or complicate?
- Naming of everything
- Do we want to add back the "grouping to explicit collection" form.
More information about the lambda-libs-spec-experts
mailing list