groupBy / reduceBy
Remi Forax
forax at univ-mlv.fr
Sun Dec 9 06:54:40 PST 2012
On 12/08/2012 09:38 PM, Brian Goetz wrote:
> So, I hate groupBy/reduceBy. Not that I hate the idea, just their
> current realization.
>
> Reasons to hate them:
>
> - They intrude Map and Collection into the Stream API, whereas
> otherwise there would be no connection (except Iterator) to Old
> Collections. This falls short of a key goal, which is for Streams to
> be a bridge from Old Collections to New Collections in the future.
> We've already severed the 32-bit size limitation; we've distanced
> ourselves from the pervasive mutability of Old Collections; this is
> the remaining connection that needs to be severed.
>
> - They are limited. You can do one level of group-by, but you can't
> do two; it requires gymnastics to, for example, take a
> Stream<Transaction> and do a multi-level tabulation like grouping into
> a Map<Buyer, Map<Seller, Collection<Transaction>>. At the same time,
> they offer limited control over what kind of Map to use, what kind of
> Collection to use for the values for a given grouping, etc.
>
> - Guava-hostile. Guava users would probably like groupBy to return a
> Multimap. This should be easy, but currently is not.
>
> - The name reduceBy is completely unclear what it does.
the other problem with reduceBy is that the combiner is only needed for
the parallel case but not for the serial one.
>
> - Too-limited control over whether to use map-merging (required if
> you want to preserve encounter order, but probably slower) or
> accumulate results directly into a single shared ConcurrentMap
> (probably faster, but only if you don't care about encounter order).
> Currently we key off of having an encounter order here, but this
> should be a user choice, not a framework choice.
>
> These negatives play into the motivation for some upcoming proposals
> about reduce forms, which will propose a new, generalized formulation
> for these methods that address these negatives. Key observations:
> - groupBy is really just reduceBy where the reduce seed is "new
> ArrayList" and the combiner function is ArrayList::add
you mean, supplier is new ArrayList and reducer is ArrayList.add
(combiner is ArrayList.addAll).
> - reduceBy is really just a reduce whose combiner function
> incorporates some mutable map mechanics
>
but the issue is that even if it's just a reduce, users will not see it
has a reduce, i.e. we have to provide a groupBy.
Let's try to do something.
- we don't need a map but something with get() and put().
- we don't need a collection but something with new() and add().
so we can be fully generic if we the user send 4 lambdas. In fact they
have to be grouped, put() without a get() is useless.
with
interface Mapping<K,V> {
V get(Object);
V put(K key, V value);
}
and Destination (that already exist, we need it in into but currently it
has only a method addAll)
interface Destination<T> {
public boolean add(T element);
public void addAll(Stream<? extends T> stream);
}
groupBy can be written
<U, D extends Destination<? super T>, M extends Mapping<? super U, D>> M
groupBy(Function<? super T, ? extends U> classifier, M mapping,
Supplier<? extends D> destinationSupplier)
and called like this:
Map<String, List<Person>> map = personStream.groupBy(Person::getName,
new HashMap<>(), ArrayList<Person>::new)
or a little better if method reference accept the diamond syntax (I
don't remember what we have decided here)
Map<String, List<Person>> map = personStream.groupBy(Person::getName,
new HashMap<>(), ArrayList<>::new)
This remove all dependencies to the old collection.
Also the interface Mapping can be changed to something like
interface Mapping<K,V> {
V lookup(K key);
void register(K key, V value);
}
if we add lookup and register as default methods in Map.
cheers,
Rémi
More information about the lambda-libs-spec-observers
mailing list