Tabulators -- a catalog

Brian Goetz brian.goetz at oracle.com
Thu Dec 27 18:23:42 PST 2012


Here's a catalog of the currently implemented Tabulators.

1.  The groupBy family.  Currently, there are 16 of these:

{ map vs mapMulti } x { explicit factories or not }
                     x { reduce forms }

where the reduce forms are:
  - nothing (classic groupBy, group into collections)
  - MutableReducer
  - BinaryOperator<T>                  // straight reduce
  - Function<T,U> + BinaryOperator<U>  // map-reduce

The MutableReducer form is what we have been calling Accumulator today; 
since all the tabulators are MutableReducer, the second form is what 
allows multi-level tabulations.

Q: Does the mapMulti variant carry its weight?  (It's not a lot of extra 
code; these extra 8 methods are in total less than 50 lines of code.)

Q: Should the mapMulti variant be called something else, like groupByMulti?

Q: The first reduce form is classic groupBy; the others are 
group+reduce.  Should they be called groupedReduce / groupedAccumulate 
for clarity?

Examples:

   // map + no explicit factories + mutable reduce form
   Map<K, D> groupBy(Function<T,K> classifier,
                     Accumulator<T,D> downstream)

   // map + explicit factories + classic reduce
   <T, K, C extends Collection<T>, M extends Map<K, C>>
   Tabulator<T, M> groupBy(Function<? super T, ? extends K> classifier,
                           Supplier<M> mapFactory,
                           Supplier<C> rowFactory) {


2.  The mappedTo family.  These take a Stream<T> and a function T->U and 
produce a MapLikeThingy<T,U>.

Four forms:

     // basic
     <T, U> Tabulator<T, Map<T,U>>
     mappedTo(Function<? super T, ? extends U> mapper)

     // with merge function to handle duplicates
     <T, U> Tabulator<T, Map<T,U>>
     mappedTo(Function<? super T, ? extends U> mapper,
              BinaryOperator<U> mergeFunction)

     // with map factory
     <T, U, M extends Map<T, U>> Tabulator<T, M>
     mappedTo(Function<? super T, ? extends U> mapper,
              Supplier<M> mapSupplier)

     // with both factory and merge function
     <T, U, M extends Map<T, U>> Tabulator<T, M>
     mappedTo(Function<? super T, ? extends U> mapper,
              BinaryOperator<U> mergeFunction,
              Supplier<M> mapSupplier)

Q: is the name good enough?

Q: what should be the default merging behavior for the forms without an 
explicit merger?  Throw?


3.  Partition.  Partitions a stream according to a predicate.  Results 
always are a two-element array of something.  Five forms:

     // Basic
     <T> Tabulator<T, Collection<T>[]>
     partition(Predicate<T> predicate)

     // Explicit factory
     <T, C extends Collection<T>> Tabulator<T, C[]>
     partition(Predicate<T> predicate,
               Supplier<C> rowFactory)

     // Partitioned mutable reduce
     <T, D> Tabulator<T, D[]>
     partition(Predicate<T> predicate,
               MutableReducer<T,D> downstream)

     // Partitioned functional reduce
     Tabulator<T, T[]>
     partition(Predicate<T> predicate,
               T zero,
               BinaryOperator<T> reducer)

     // Partitioned functional map-reduce
     Tabulator<T, T[]>
     partition(Predicate<T> predicate,
               T zero,
               Function<T, U> mapper,
               BinaryOperator<T> reducer)



All of these implement MutableReducer/Accumulator/Tabulator, which means 
any are suitable for use as the downstream reducer, allowing all of 
these to be composed with each other.  (Together all of these are about 
300 lines of relatively straight-forward code.)

More?  Fewer?  Different?



More information about the lambda-libs-spec-observers mailing list