GroupByOp.evaluateParallel

Thu Oct 4 03:37:56 PDT 2012

On Oct 3, 2012, at 11:37 PM, Brian Goetz <brian.goetz at Oracle.COM> wrote:
> Another implementation approach would be to only have one Map, a ConcurrentHashMap, whose keys are StreamBuilders.  The first thread to discover a key does a putIfAbsent(k, new SB()).  Insertion proceeds as 
> 
>  synchronized(sb) {
>    sb.add(v);
>  }
> 
> If the classifier function spreads the keys broadly then contention may not be a huge issue, and then there is no reduction phase at all. 
> 

I was wondering about using CHM and the contention on the value, i was going out my way to try and avoid using "synchronised" :-), 

It is certainly much simpler and using this approach we can also explicitly cache and reuse sink chain.

However, it does bring up two issues:

1) null keys, which would be disallowed for the parallel case when using CHM but not necessarily for the sequential case; and

2) order of elements in the StreamBuilder. When using CHM the order of elements in the collection may not relate to the encounter order, where as in the reducing implementation that order is preserved, which if the input is sorted then so are the collections. I suppose since Collection is specified as the map value we are not making any guarantee about order.

Here's a new webrev (based on the previous one, sorry it was just quicker/easier that way) to compare and contrast:

  http://cr.openjdk.java.net/~psandoz/lambda/pargroupby2/webrev/

Notice the change in GroupByOpTest to not rely on order when comparing the Collection<T> values.

Paul.