Forms for reduce() -- part 1
Brian Goetz
brian.goetz at oracle.com
Mon Dec 10 12:54:05 PST 2012
I have been doing some brainstorming on forms for "fold". My primary
goals for revisiting this include:
- As mentioned in an earlier note, I want to get Map and Collection
out of the Streams API (groupBy and reduceBy currently intrude these).
This message lays the groundwork for this and I will follow up on these
in a separate note. As I noted, there are many things currently wrong
with the current groupBy/reduceBy that I want to fix.
- Support "mutable fold" cases better, where the "seed" is really a
mutable container (like a StringBuffer.)
I'll start with use cases. There are some that fit purely into a
traditional functional model, and others that fit better into a mutable
model. While one can wedge one into the other, I think it may be better
to be explicit about both.
I am not suggesting naming right now, they could all be called reduce,
though we may want to use different names to describe the functional vs
mutable cases.
Use cases -- purely functional
------------------------------
1. Homogeneous operations on monoid (e.g., sum). Here, there is a
monoid with a known zero.
T reduce(T zero, BinaryOperator<T> reducer)
2. Homogeneous operations on non-monoids (e.g., min). Here, there is
no sensible zero, so we use Optional to reflect "nothing there". Ideally
we would like to delay boxing to Optional until the very last operation
(in other words, use (boolean, T) as the internal state and box to
Optional<T> at the very end.)
Optional<T> reduce(BinaryOperator<T> reducer)
3. Nonhomogeneous operations (aka foldl, such as "sum of weights").
This requires an additional combiner function for this to work in
parallel.
<U> U reduce(U zero, (U,T) -> U reducer, (U,U -> U) combiner)
<U> Optional<U> reduce(T->U first, (U,T) -> U reducer,
(U,U -> U) combiner)
Note that most cases where we might be inclined to return Optional<U>
can be written as stream.map(T->U).reduce(BinaryOperator<T>).
Doug points out: if we went with "null means nothing", we wouldn't need
the optional forms.
This is basically what we have now, though we're currently calling the
last form "fold". Doug has suggested we call them all reduce.
Sub-question: people are constantly pointing out "but you don't need the
combiner for the serial case." My orientation here is that the serial
case is a special case, and while we want to ensure that those cases are
well-served, we don't necessarily want to distort the API to include
things that *only* work in the serial case.
Use cases -- mutable
--------------------
Many fold-like operations are better expressed with mutable state. We
could easily simulate them with the foldl form, but it may well be
better to call this form out specially. In these cases, there is also
often a distinct internal and external representation. I'll give them
the deliberately stupid name mReduce for now.
The general form is:
<I, R> mReduce(Supplier<I> makeEmpty,
BiBlock<I, T> addElement,
BiBlock<I, I> combineResults,
Function<I, E> getFinalResult)
Here, I is the intermediate form, and E is the result. There are many
cases where computations with an intermediate form is more efficient, so
we want to maintain the intermediate form for as long as possible --
ideally until the last possible minute (when the whole reduction is done.)
The analogue of reducer/combiner in the functional forms is "accept a
new element" (addElement) and "combine one intermediate form with
another" (combineResults).
Examples:
3. Average. Here, we use an array of two ints to hold length and
count. (Alternately we could use a custom tuple class.) Our
intermediate form is int[2] and our final form is Double.
Double average = integers.mReduce(() -> new int[2],
(a, i) -> { a[0] += i; a[1]++ },
(a, b) -> { a[0] += b[0]; a[1] += b[1] },
a -> (double) a[0] / a[1]);
Here, we maintain the int[2] form all the way throughout the
computation, including as we combine up the tree, and only convert to
double at the last minute.
4. String concatenation
The signatures of the SAMs in mReduce were chosen to work with existing
builder-y classes such as StringBuffer or ArrayList. We can do string
concatenation using the functional form using String::concat, but it is
inefficient -- lots of copying as we go up the tree. We can still use a
mutable fold to do a concatenation with StringBuilder and mReduce. It
has the nice property that all the arguments already have methods that
have the right signature, so we can do it all with method refs.
String s = strings.mReduce(StringBuilder::new,
StringBuilder::append,
StringBuilder::append,
StringBuilder::toString);
In this example, the two append method refs are targeting different
versions of StringBuilder.append; the first is append(String) and the
second is append(StringBuilder). But the compiler will figure this out.
5. toArray
We can express "toArray" as a mutable fold using ArrayList to accumulate
values and converting to an array at the end, just as with StringBuilder:
Object[] array = foos.reduce(ArrayList::new,
ArrayList::add,
ArrayList::addAll,
ArrayList::toArray);
There are other mutable reduction use cases too. For example, sort can
be implemented by providing a "insert in order" and a "merge sorted
lists" method. While these are not necessarily the most efficient
implementation, they may well make reasonable last-ditch defaults.
Both of these examples use separate internal forms (StringBuffer,
ArrayList) and external forms (String, array).
Finally, for reasons that may become clearer in the next message, I
think we should consider having an abstraction for "Reducer" or
"Reduction" that captures all the bits needed for a reduction. This
would allow the averager above to be reused:
double average = integers.reduce(Reducers.INT_AVERAGER);
This turns into a win when we try to recast groupBy/reduceBy into being
general reductions (next message).
So, summary:
Functional forms:
public U reduce(final U seed, final BinaryOperator<U> op) {
public Optional<U> reduce(BinaryOperator<U> op) {
public <R> R reduce(R base, Combiner<R, U, R> reducer,
BinaryOperator<R> combiner) {
Mutable form:
public <I, R> R reduce(Supplier<I> baseFactory,
BiBlock<I, U> reducer,
BiBlock<I, I> combiner,
Function<I, R> finalResultMapper) {
(and possibly a mutable form for special case where I=R)
Possibly a form for a canned Reducer:
public<R> R reduce(Reducer<T,R> reducer);
More information about the lambda-libs-spec-observers
mailing list