Extending Collector to handle a post-transform
brian.goetz at oracle.com
Fri Jun 14 14:01:13 PDT 2013
BTW, this notion of a parallel reduction as a quad of functions:
(initial-result, accumulate-element, merge-result, final-transform)
shows up in a lot of places. Here are just two that were pointed out to
us as we explored this feature:
User defined aggregates in MS SQL Server:
(Thanks Erik for this pointer.)
Ypnos: declarative, parallel structured grid programming.
(http://doi.acm.org/10.1145/1708046.1708053), which describes a
Haskell-hosted EDSL for parallel stencil computations:
Some reductions generate values of a different type to the element
type of a grid. A structure called a
Reducer packs together a number of functions for parallel reduction
under reduction operators of this type.
The mkReducer constructor builds a Reducer, taking four parameters:
• A function reducing an element and partially-reduced value to another
partially-reduced value: (a → b → b)
• A function combining two partially-reduced values, possibly from two
reduction processes on subgrids: (b → b → b)
• An initial partial result: b
• A final conversion function that converts the partial-result to a
final value: (b → c).
(Thanks Guy for this pointer.)
In addition, we got requests for this feature from the Oracle "Sumatra"
team, which is exploring the practicality of transparently translating
Java bulk operations to run on GPUs. The notions from the "Ypnos" paper
above show up all over the GPGPU literature.
On 6/12/2013 11:39 PM, Mike Duigou wrote:
> On Jun 11 2013, at 10:04 , Brian Goetz wrote:
>>> What's bad?
>>> - More generics in Collector signatures. For Collectors that don't want to export their intermediate type, they are declared as Collector<T, ?, R>, which users may find disturbing. (The obvious attempts to make the extra type arg go away don't work.)
> For me this extra type parameter for the intermediary on Collector is no different than the extra type param on BaseStream. Any time you have a type variable that is not part of the user's generification it's going to feel uncomfortable. For Collector the extra param goes largely un-noticed though Collector is rarely assigned. Collector is mostly used as an argument and in this case the wildcard is invisible. The types (and wildcards) just flow through unobserved. This seems fine and overall it's a huge benefit to handle the post-transform in the Collector.
More information about the lambda-libs-spec-experts