Reducing reduce

Brian Goetz brian.goetz at oracle.com
Mon Feb 11 09:12:01 PST 2013


Thanks, Joe.  I knew I was missing some use cases.  This is definitely a 
case where the fused version is more efficient, since it can elide some 
work based on the previous comparison state.

On 2/11/2013 11:57 AM, Joe Bowbeer wrote:
> My parallel string-compare sample provides two implementations (below).
>
> Will both of these survive your changes?
>
> bitbucket.org/joebowbeer/stringcompare
> <http://bitbucket.org/joebowbeer/stringcompare>
>
>    int  compareMapReduce(String  s1,  String  s2)  {
>      assert  s1.length()  ==  s2.length();
>      return  intRange(0,  s1.length()).parallel()
>          .map(i  ->  compare(s1.charAt(i),  s2.charAt(i)))
>          .reduce(0,  (l,  r)  ->  (l  !=  0)  ?  l  :  r);
>    }
>
>    int  compareBoxedReduce(String  s1,  String  s2)  {
>      assert  s1.length()  ==  s2.length();
>      return  intRange(0,  s1.length()).parallel().boxed()
>          .reduce(0,  (l,  i)  ->  (l  !=  0)  ?  l  :  compare(s1.charAt(i),  s2.charAt(i)),
>                     (l,  r)  ->  (l  !=  0)  ?  l  :  r);
>    }
>
>
>
>
> The person who sold me the second form told me it would "burn less
> heat". He said that I could optimize my map/reduce by having it "not
> even calculate *f* if the left operand is nonzero, by combining the map
> and reduce steps into a fold."
>
> What is that person going to sell me now?
>
> Joe
>
>
> On Mon, Feb 11, 2013 at 8:41 AM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>     Now that we've added all the shapes of map() to Stream (map to
>     ref/int/long/double), and we've separated functional reduce
>     (currently called reduce) from mutable reduce (currently called
>     collect), I think that leaves room for taking out one of the reduce
>     methods from Stream:
>
>          <U> U reduce(U identity,
>                       BiFunction<U, ? super T, U> accumulator,
>                       BinaryOperator<U> reducer);
>
>     This is the one that confuses everyone anyway, and I don't think we
>     need it any more.
>
>     The argument for having this form instead of discrete map+reduce are:
>       - fused map+reduce reduces boxing
>       - this three-arg form can also fold filtering into the accumulation
>
>     However, since we now have primitive-bearing map methods, and we can
>     do filtering before and after the map, is this form really carrying
>     its weight?  Specifically because people find it counterintuitive,
>     we should consider dropping it and guiding people towards map+reduce.
>
>     For example, "sum of pages" over a stream of Documents is better
>     written as:
>
>        docs.map(Document::__getPageCount).sum()
>
>     rather than
>
>        docs.reduce(0, (count, d) -> count + d.getPageCount(), Integer::sum)
>
>     The big place where we need three-arg reduce is when we're folding
>     into a mutable store.  But that's now handled by collect().
>
>     Have I missed any use cases that would justify keeping this form?
>
>


More information about the lambda-libs-spec-experts mailing list