Reducing reduce

Brian Goetz brian.goetz at oracle.com
Mon Feb 18 12:20:23 PST 2013


Circling back to this (i.e., "reducing reduce", redux):

There are a lot of considerations here, many mostly accidental (e.g., 
consequences of erasure and the primitive/reference divide).

The three-arg functional reduce form is functionally equivalent to the 
two-arg form, except that there are some constructions that are more 
efficient to handle in the three arg form.  However, the best example we 
came up with, Joe's string compare, suffers because he had to use 
boxing.  So we're currently in a place where the best example to support 
this form has other defects that make the form hard to support.  And, 
any form of functional reduce on a reference would likely result in a 
lot of object creation, so the optimization of eliding some of the 
mapping would have to overcome that.

Further, one can still handle this without boxing using collect() and an 
explicit mutable result holder.  On the other hand, if/when the language 
acquires tuples, it will be a very different story, and this form would 
become infinitely more useful.

So I think the evidence weighs slightly in favor of ditching this form 
for now (though I'd feel better if people didn't have to use either an 
ad-hoc class or a single-element array as the data box when using the 
collect() form.)

Secondarily, ditching the three-arg form from Stream would remove one 
element of support for naming reduce and collect differently; part of 
the motivation for a different name was that the three-arg collect and 
three-arg reduce overloaded very poorly if they had the same name. 
However, I think we should resist the temptation to act on this.  I 
think (a) there is pedagogical value in separating function and mutable 
reduce forms, and (b) if we do this, we slam the door on the more 
flexible version, which will badly bite us in a tupled future.

We might still consider the three-arg version for IntStream.  That's the 
case where Joe's example works.

On 2/11/2013 11:41 AM, Brian Goetz wrote:
> Now that we've added all the shapes of map() to Stream (map to
> ref/int/long/double), and we've separated functional reduce (currently
> called reduce) from mutable reduce (currently called collect), I think
> that leaves room for taking out one of the reduce methods from Stream:
>
>      <U> U reduce(U identity,
>                   BiFunction<U, ? super T, U> accumulator,
>                   BinaryOperator<U> reducer);
>
> This is the one that confuses everyone anyway, and I don't think we need
> it any more.
>
> The argument for having this form instead of discrete map+reduce are:
>   - fused map+reduce reduces boxing
>   - this three-arg form can also fold filtering into the accumulation
>
> However, since we now have primitive-bearing map methods, and we can do
> filtering before and after the map, is this form really carrying its
> weight?  Specifically because people find it counterintuitive, we should
> consider dropping it and guiding people towards map+reduce.
>
> For example, "sum of pages" over a stream of Documents is better written
> as:
>
>    docs.map(Document::getPageCount).sum()
>
> rather than
>
>    docs.reduce(0, (count, d) -> count + d.getPageCount(), Integer::sum)
>
> The big place where we need three-arg reduce is when we're folding into
> a mutable store.  But that's now handled by collect().
>
> Have I missed any use cases that would justify keeping this form?
>


More information about the lambda-libs-spec-observers mailing list