Yet another run at reduce

Brian Goetz brian.goetz at oracle.com
Mon Jan 7 17:31:42 PST 2013


I think Doug and I made some progress on reduce forms today.

Recap: there are two ways to do a reduction to a mutable data structure 
like a Map; a classic fold (where you create lots of little maps, one 
per leaf of the computation tree, accumulate leaf data into them in a 
thread-confined manner, then merge their contents up the tree into one 
big map), and a concurrent bombardment (where you create one big 
concurrent-safe map, and blast elements in from many threads.)  Call 
these "A" and "B".

The requirement for A is that:
  - your combining functions are associative
  - you can create multiple containers
  - you can incorporate a new element into a container
  - you can merge containers in a way that satisfies the obvious 
distributive requirement

If you meet these requirements, you can do a parallel A reduce that 
respects encounter order.

The requirement for B is that:
  - your combining functions are associative and commutative
  - you can incorporate a new element into a container
  - your container support concurrent incorporation

If you meet these requirements, you can do a parallel B reduce that does 
NOT respect order, but may be faster (or slower, if contention is high 
enough.)

Key observation: A "B" reducer can become an "A" reducer simply by 
adding the merge ability, which is no harder than for regular A reducers.

So rather than have A reducers and B reducers, we can have A reducers 
and A+B reducers.  It's only marginally more work for B reducer writers.

So...

public interface Reducer<T, R> {
     boolean isConcurrent();
     R makeResult();
     void accumulate(R result, T value);
     R combine(R result, R other);
}

A reducers return 'false' for isConcurrent; B reducers return 'true'.

What was Concurrent{Reducer,Tabulator,Accumulator} goes away.

Now, in addition to the various forms of reduce(), we add overloaded:

   reduce(Reducer)
   reduceUnordered(Reducer)

You will get a B reduction if (a) you selected reduceUnordered and (b) 
your reducer is concurrent.  Otherwise you will get an A reduction.

This is nice because the knowledge of properties of "user doesn't care 
about order" and "target supports concurrent insertion" naturally live 
in different places; this separates them properly.  The latter is a 
property of the reducer implementation; the former only the user knows 
about and therefore should live at the call site.  Each contributes 
their bit and can mostly remain ignorant of the other; the library 
combines these bits, and if both are present, you get a B reduction.

The reduceUnordered() method can be cast as an optimization to reduce(); 
it is always correct to use reduce(), but may be faster to use 
reduceUnordered(), as long as you are willing to forgo order and the 
reducer is coooperative.

In neither case (assuming a properly implemented reducer) will you ever 
get a thread-unsafe result; if you ask for an unordered reduction with a 
nonconcurrent reducer, you get a safe ordered reduction instead, which 
might be slower (or faster.)

Pros:
  - ConcurrentReducer goes away
  - .unordered() goes away
  - Properly separates order-sensitivity from concurrency

Cons:
  - Two variations on various aspects of reducing are surfaced in the 
API (unordered and concurrent) that will make users say "huh?" (but if 
you get it wrong /choose to stay ignorant you still get the right answer.)

Implementation coming.



More information about the lambda-libs-spec-experts mailing list