Reducing reduce
Brian Goetz
brian.goetz at oracle.com
Mon Feb 11 09:12:01 PST 2013
Thanks, Joe. I knew I was missing some use cases. This is definitely a
case where the fused version is more efficient, since it can elide some
work based on the previous comparison state.
On 2/11/2013 11:57 AM, Joe Bowbeer wrote:
> My parallel string-compare sample provides two implementations (below).
>
> Will both of these survive your changes?
>
> bitbucket.org/joebowbeer/stringcompare
> <http://bitbucket.org/joebowbeer/stringcompare>
>
> int compareMapReduce(String s1, String s2) {
> assert s1.length() == s2.length();
> return intRange(0, s1.length()).parallel()
> .map(i -> compare(s1.charAt(i), s2.charAt(i)))
> .reduce(0, (l, r) -> (l != 0) ? l : r);
> }
>
> int compareBoxedReduce(String s1, String s2) {
> assert s1.length() == s2.length();
> return intRange(0, s1.length()).parallel().boxed()
> .reduce(0, (l, i) -> (l != 0) ? l : compare(s1.charAt(i), s2.charAt(i)),
> (l, r) -> (l != 0) ? l : r);
> }
>
>
>
>
> The person who sold me the second form told me it would "burn less
> heat". He said that I could optimize my map/reduce by having it "not
> even calculate *f* if the left operand is nonzero, by combining the map
> and reduce steps into a fold."
>
> What is that person going to sell me now?
>
> Joe
>
>
> On Mon, Feb 11, 2013 at 8:41 AM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
> Now that we've added all the shapes of map() to Stream (map to
> ref/int/long/double), and we've separated functional reduce
> (currently called reduce) from mutable reduce (currently called
> collect), I think that leaves room for taking out one of the reduce
> methods from Stream:
>
> <U> U reduce(U identity,
> BiFunction<U, ? super T, U> accumulator,
> BinaryOperator<U> reducer);
>
> This is the one that confuses everyone anyway, and I don't think we
> need it any more.
>
> The argument for having this form instead of discrete map+reduce are:
> - fused map+reduce reduces boxing
> - this three-arg form can also fold filtering into the accumulation
>
> However, since we now have primitive-bearing map methods, and we can
> do filtering before and after the map, is this form really carrying
> its weight? Specifically because people find it counterintuitive,
> we should consider dropping it and guiding people towards map+reduce.
>
> For example, "sum of pages" over a stream of Documents is better
> written as:
>
> docs.map(Document::__getPageCount).sum()
>
> rather than
>
> docs.reduce(0, (count, d) -> count + d.getPageCount(), Integer::sum)
>
> The big place where we need three-arg reduce is when we're folding
> into a mutable store. But that's now handled by collect().
>
> Have I missed any use cases that would justify keeping this form?
>
>
More information about the lambda-libs-spec-observers
mailing list