accumulate locally and mutatively but combine concurrently (a use case)

Fri May 10 16:59:38 PDT 2013

On 05/10/13 18:44, Brian Goetz wrote:

>
> But..there are also (courtesy of Doug) some data structures that are
> designed for concurrent modification, like ConcurrentHashMap.  In some
> cases, doing a groupBy by blasting elements into the same CHM from many
> threads at once is more performant than making a bunch of isolated small
> maps and merging them, because merge-by-key is not something that
> HashMap/TreeMap do all that well.

John: Here's another way of thinking about this. As a reduce/collect
proceeds, the execution framework periodically tells stream code that
it can merge. It takes some effort to do this, in addition to the
merge effort. These together usually  cost more than it would take
to do fine-grained per-element merges in a concurrent
data structure, in essence skipping all the execution sync control,
plus a lot of the copying.

Parallel reduction/accumulation is of course an old and well-studied
area, so there are many known good ways and even more known bad ways
to handle particular cases. But for collecting into hash tables,
we're pretty sure that using CHM in CONCURRENT mode is the best
available choice.

-Doug