Tabulators -- a catalog
Brian Goetz
brian.goetz at oracle.com
Thu Dec 27 18:23:42 PST 2012
Here's a catalog of the currently implemented Tabulators.
1. The groupBy family. Currently, there are 16 of these:
{ map vs mapMulti } x { explicit factories or not }
x { reduce forms }
where the reduce forms are:
- nothing (classic groupBy, group into collections)
- MutableReducer
- BinaryOperator<T> // straight reduce
- Function<T,U> + BinaryOperator<U> // map-reduce
The MutableReducer form is what we have been calling Accumulator today;
since all the tabulators are MutableReducer, the second form is what
allows multi-level tabulations.
Q: Does the mapMulti variant carry its weight? (It's not a lot of extra
code; these extra 8 methods are in total less than 50 lines of code.)
Q: Should the mapMulti variant be called something else, like groupByMulti?
Q: The first reduce form is classic groupBy; the others are
group+reduce. Should they be called groupedReduce / groupedAccumulate
for clarity?
Examples:
// map + no explicit factories + mutable reduce form
Map<K, D> groupBy(Function<T,K> classifier,
Accumulator<T,D> downstream)
// map + explicit factories + classic reduce
<T, K, C extends Collection<T>, M extends Map<K, C>>
Tabulator<T, M> groupBy(Function<? super T, ? extends K> classifier,
Supplier<M> mapFactory,
Supplier<C> rowFactory) {
2. The mappedTo family. These take a Stream<T> and a function T->U and
produce a MapLikeThingy<T,U>.
Four forms:
// basic
<T, U> Tabulator<T, Map<T,U>>
mappedTo(Function<? super T, ? extends U> mapper)
// with merge function to handle duplicates
<T, U> Tabulator<T, Map<T,U>>
mappedTo(Function<? super T, ? extends U> mapper,
BinaryOperator<U> mergeFunction)
// with map factory
<T, U, M extends Map<T, U>> Tabulator<T, M>
mappedTo(Function<? super T, ? extends U> mapper,
Supplier<M> mapSupplier)
// with both factory and merge function
<T, U, M extends Map<T, U>> Tabulator<T, M>
mappedTo(Function<? super T, ? extends U> mapper,
BinaryOperator<U> mergeFunction,
Supplier<M> mapSupplier)
Q: is the name good enough?
Q: what should be the default merging behavior for the forms without an
explicit merger? Throw?
3. Partition. Partitions a stream according to a predicate. Results
always are a two-element array of something. Five forms:
// Basic
<T> Tabulator<T, Collection<T>[]>
partition(Predicate<T> predicate)
// Explicit factory
<T, C extends Collection<T>> Tabulator<T, C[]>
partition(Predicate<T> predicate,
Supplier<C> rowFactory)
// Partitioned mutable reduce
<T, D> Tabulator<T, D[]>
partition(Predicate<T> predicate,
MutableReducer<T,D> downstream)
// Partitioned functional reduce
Tabulator<T, T[]>
partition(Predicate<T> predicate,
T zero,
BinaryOperator<T> reducer)
// Partitioned functional map-reduce
Tabulator<T, T[]>
partition(Predicate<T> predicate,
T zero,
Function<T, U> mapper,
BinaryOperator<T> reducer)
All of these implement MutableReducer/Accumulator/Tabulator, which means
any are suitable for use as the downstream reducer, allowing all of
these to be composed with each other. (Together all of these are about
300 lines of relatively straight-forward code.)
More? Fewer? Different?
More information about the lambda-libs-spec-observers
mailing list