Reducing reduce

Joe Bowbeer joe.bowbeer at gmail.com
Mon Feb 18 15:16:38 PST 2013


I wouldn't have thought that boxed had anything to do with "our" 3-arg
reduce example.  Boxed is by-product of my decision to use a primitive
generator (intRange).  I could have picked a different generator and then I
wouldn't have needed boxed(), yet the 3-arg reduce form would be unaffected.

There are lots of applications for prefix-sums.  Guy Blelloch listed 13 in
1993, and string-compare just happened to be at the top of the list, where
I started:

http://www.cs.cmu.edu/~guyb/papers/Ble93.pdf

I have a couple of related questions, which I think may be raised by others:

1. Why don't we have a 3-arg mapreduce like Guy Steele discusses in his
Parallel-Not talks?

http://vimeo.com/6624203

(or a map-scan-zip?)

2. Why don't we have a parallel fold (map+combine) like Rich Hickey added
to Clojure?

http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html

--Joe



On Mon, Feb 18, 2013 at 12:20 PM, Brian Goetz <brian.goetz at oracle.com>wrote:

> Circling back to this (i.e., "reducing reduce", redux):
>
> There are a lot of considerations here, many mostly accidental (e.g.,
> consequences of erasure and the primitive/reference divide).
>
> The three-arg functional reduce form is functionally equivalent to the
> two-arg form, except that there are some constructions that are more
> efficient to handle in the three arg form.  However, the best example we
> came up with, Joe's string compare, suffers because he had to use boxing.
>  So we're currently in a place where the best example to support this form
> has other defects that make the form hard to support.  And, any form of
> functional reduce on a reference would likely result in a lot of object
> creation, so the optimization of eliding some of the mapping would have to
> overcome that.
>
> Further, one can still handle this without boxing using collect() and an
> explicit mutable result holder.  On the other hand, if/when the language
> acquires tuples, it will be a very different story, and this form would
> become infinitely more useful.
>
> So I think the evidence weighs slightly in favor of ditching this form for
> now (though I'd feel better if people didn't have to use either an ad-hoc
> class or a single-element array as the data box when using the collect()
> form.)
>
> Secondarily, ditching the three-arg form from Stream would remove one
> element of support for naming reduce and collect differently; part of the
> motivation for a different name was that the three-arg collect and
> three-arg reduce overloaded very poorly if they had the same name. However,
> I think we should resist the temptation to act on this.  I think (a) there
> is pedagogical value in separating function and mutable reduce forms, and
> (b) if we do this, we slam the door on the more flexible version, which
> will badly bite us in a tupled future.
>
> We might still consider the three-arg version for IntStream.  That's the
> case where Joe's example works.
>
>
> On 2/11/2013 11:41 AM, Brian Goetz wrote:
>
>> Now that we've added all the shapes of map() to Stream (map to
>> ref/int/long/double), and we've separated functional reduce (currently
>> called reduce) from mutable reduce (currently called collect), I think
>> that leaves room for taking out one of the reduce methods from Stream:
>>
>>      <U> U reduce(U identity,
>>                   BiFunction<U, ? super T, U> accumulator,
>>                   BinaryOperator<U> reducer);
>>
>> This is the one that confuses everyone anyway, and I don't think we need
>> it any more.
>>
>> The argument for having this form instead of discrete map+reduce are:
>>   - fused map+reduce reduces boxing
>>   - this three-arg form can also fold filtering into the accumulation
>>
>> However, since we now have primitive-bearing map methods, and we can do
>> filtering before and after the map, is this form really carrying its
>> weight?  Specifically because people find it counterintuitive, we should
>> consider dropping it and guiding people towards map+reduce.
>>
>> For example, "sum of pages" over a stream of Documents is better written
>> as:
>>
>>    docs.map(Document::**getPageCount).sum()
>>
>> rather than
>>
>>    docs.reduce(0, (count, d) -> count + d.getPageCount(), Integer::sum)
>>
>> The big place where we need three-arg reduce is when we're folding into
>> a mutable store.  But that's now handled by collect().
>>
>> Have I missed any use cases that would justify keeping this form?
>>
>>


More information about the lambda-libs-spec-observers mailing list