Forms for reduce() -- part 1

Brian Goetz brian.goetz at oracle.com
Mon Dec 10 12:54:05 PST 2012


I have been doing some brainstorming on forms for "fold".  My primary 
goals for revisiting this include:

  - As mentioned in an earlier note, I want to get Map and Collection 
out of the Streams API (groupBy and reduceBy currently intrude these). 
This message lays the groundwork for this and I will follow up on these 
in a separate note.  As I noted, there are many things currently wrong 
with the current groupBy/reduceBy that I want to fix.

  - Support "mutable fold" cases better, where the "seed" is really a 
mutable container (like a StringBuffer.)

I'll start with use cases.  There are some that fit purely into a 
traditional functional model, and others that fit better into a mutable 
model.  While one can wedge one into the other, I think it may be better 
to be explicit about both.

I am not suggesting naming right now, they could all be called reduce, 
though we may want to use different names to describe the functional vs 
mutable cases.


Use cases -- purely functional
------------------------------

1.  Homogeneous operations on monoid (e.g., sum).  Here, there is a 
monoid with a known zero.

   T reduce(T zero, BinaryOperator<T> reducer)

2.  Homogeneous operations on non-monoids (e.g., min).  Here, there is 
no sensible zero, so we use Optional to reflect "nothing there". Ideally 
we would like to delay boxing to Optional until the very last operation 
(in other words, use (boolean, T) as the internal state and box to 
Optional<T> at the very end.)

   Optional<T> reduce(BinaryOperator<T> reducer)

3.  Nonhomogeneous operations (aka foldl, such as "sum of weights"). 
This requires an additional combiner function for this to work in 
parallel.

  <U> U reduce(U zero, (U,T) -> U reducer, (U,U -> U) combiner)
  <U> Optional<U> reduce(T->U first, (U,T) -> U reducer,
                         (U,U -> U) combiner)

Note that most cases where we might be inclined to return Optional<U> 
can be written as stream.map(T->U).reduce(BinaryOperator<T>).

Doug points out: if we went with "null means nothing", we wouldn't need 
the optional forms.

This is basically what we have now, though we're currently calling the 
last form "fold".  Doug has suggested we call them all reduce.

Sub-question: people are constantly pointing out "but you don't need the 
combiner for the serial case."  My orientation here is that the serial 
case is a special case, and while we want to ensure that those cases are 
well-served, we don't necessarily want to distort the API to include 
things that *only* work in the serial case.


Use cases -- mutable
--------------------

Many fold-like operations are better expressed with mutable state.  We 
could easily simulate them with the foldl form, but it may well be 
better to call this form out specially.  In these cases, there is also 
often a distinct internal and external representation.  I'll give them 
the deliberately stupid name mReduce for now.

The general form is:

  <I, R> mReduce(Supplier<I> makeEmpty,
                 BiBlock<I, T> addElement,
                 BiBlock<I, I> combineResults,
                 Function<I, E> getFinalResult)

Here, I is the intermediate form, and E is the result.  There are many 
cases where computations with an intermediate form is more efficient, so 
we want to maintain the intermediate form for as long as possible -- 
ideally until the last possible minute (when the whole reduction is done.)

The analogue of reducer/combiner in the functional forms is "accept a 
new element" (addElement) and "combine one intermediate form with 
another" (combineResults).

Examples:

3.  Average.  Here, we use an array of two ints to hold length and 
count.  (Alternately we could use a custom tuple class.)  Our 
intermediate form is int[2] and our final form is Double.

   Double average = integers.mReduce(() -> new int[2],
                        (a, i) -> { a[0] += i; a[1]++ },
                        (a, b) -> { a[0] += b[0]; a[1] += b[1] },
                        a -> (double) a[0] / a[1]);

Here, we maintain the int[2] form all the way throughout the 
computation, including as we combine up the tree, and only convert to 
double at the last minute.

4.  String concatenation

The signatures of the SAMs in mReduce were chosen to work with existing 
builder-y classes such as StringBuffer or ArrayList.  We can do string 
concatenation using the functional form using String::concat, but it is 
inefficient -- lots of copying as we go up the tree.  We can still use a 
mutable fold to do a concatenation with StringBuilder and mReduce.  It 
has the nice property that all the arguments already have methods that 
have the right signature, so we can do it all with method refs.

   String s = strings.mReduce(StringBuilder::new,
                              StringBuilder::append,
                              StringBuilder::append,
                              StringBuilder::toString);

In this example, the two append method refs are targeting different 
versions of StringBuilder.append; the first is append(String) and the 
second is append(StringBuilder).  But the compiler will figure this out.

5.  toArray

We can express "toArray" as a mutable fold using ArrayList to accumulate 
values and converting to an array at the end, just as with StringBuilder:

   Object[] array = foos.reduce(ArrayList::new,
                                ArrayList::add,
                                ArrayList::addAll,
                                ArrayList::toArray);

There are other mutable reduction use cases too.  For example, sort can 
be implemented by providing a "insert in order" and a "merge sorted 
lists" method.  While these are not necessarily the most efficient 
implementation, they may well make reasonable last-ditch defaults.

Both of these examples use separate internal forms (StringBuffer, 
ArrayList) and external forms (String, array).


Finally, for reasons that may become clearer in the next message, I 
think we should consider having an abstraction for "Reducer" or 
"Reduction" that captures all the bits needed for a reduction.  This 
would allow the averager above to be reused:

   double average = integers.reduce(Reducers.INT_AVERAGER);

This turns into a win when we try to recast groupBy/reduceBy into being 
general reductions (next message).


So, summary:

Functional forms:

     public U reduce(final U seed, final BinaryOperator<U> op) {

     public Optional<U> reduce(BinaryOperator<U> op) {

     public <R> R reduce(R base, Combiner<R, U, R> reducer, 
BinaryOperator<R> combiner) {

Mutable form:

     public <I, R> R reduce(Supplier<I> baseFactory,
                            BiBlock<I, U> reducer,
                            BiBlock<I, I> combiner,
                            Function<I, R> finalResultMapper) {

(and possibly a mutable form for special case where I=R)

Possibly a form for a canned Reducer:

    public<R> R reduce(Reducer<T,R> reducer);






More information about the lambda-libs-spec-observers mailing list