Setting of UNORDERED on concurrent collectors

Mon Apr 8 13:07:40 PDT 2013

>
>
> That would cause us to access a non-thread-safe HashMap concurrently from
> multiple threads.

I assumed that a concurrent collection would use a concurrent map.  Isn't
 it reasonable to assume that operations on a parallel stream will use
thread-safe collections?

BTW, the other downside of the current state of affairs is experienced by
the user who specifies a parallel stream and even declares it unordered,
but still gets a non-concurrent collection because groupingBy was used
instead of groupingByConcurrent.

In your examples, the difference between the two results is primarily one
of order, not concurrency.  Can we reflect this choice more directly in the
API?

Joe

On Mon, Apr 8, 2013 at 12:50 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> What is groupingByConcurrent good for?  What's the difference between
>> parallel and concurrent in this context?
>>
>
> For sequential streams, concurrent is irrelevant.  So this is only
> relevant for parallel streams.
>
> When doing a reduction on a parallel stream, there are two obvious ways to
> do it:
>
>  - Partition the input, reduce the chunks separately into isolated
> subresults, then combine the subresults "up the tree" into a complete
> result (call this a "traditional" parallel reduction)
>
>  - Use some sort of thread-safe combiner, and blast input elements at some
> shared combiner from all threads (call this a "concurrent" reduction.)
>  This is more like a forEach than a reduce.  Requirements for this to be
> safe include: the combiner must be thread-safe, and the user must not care
> about order, since there's no telling in what order the elements will be
> blasted.
>
> When the reduction is a groupBy into a Map, this can make a big difference
> because of the merging performance of HashMap.
>
> The traditional reduction looks like this:
>  - create a HashMap per partition
>  - Insert the elements of this partition into the HashMap
>  - Go up the tree, merging two HashMaps into one.  This involves iterating
> a key-by-key merge.  This is slow.
>
> The concurrent reduction looks like:
>  - create one ConcurrentHashMap
>  - Blast elements into it using atomic methods like putIfAbsent
>  - Return that, no merging
>
> In most reasonable cases, concurrent parallel reduction with CHM blows
> away traditional parallel reduction with HashMap.  On the other hand, one
> of the casualties of the concurrent approach is ordering.
>
> If your input is (ordered):
>
>   [ 1, 2, 3, 4, 5, 6, 7, 8 ]
>
> and your classifier function is:
>
>   e % 2
>
> then the traditional approach must yield:
>  { 0 => [ 2, 4, 6, 8 ],
>    1 => [ 1, 3, 5, 7] }
>
> but the concurrent approach could yield:
>
>  { 0 => [ 6, 2, 4, 8 ],
>    1 => [ 7, 1, 3, 5 ] }
>
> So the question is, when confronted with an obvious desire to use a
> concurrent-safe collector, do we infer that the user must not care about
> ordering?
>
>
>  The most succinct indication of its function is:
>>
>>     The collect(Collector) method currently performs a
>>     concurrent collection when all of the following are true:
>>       - the stream is parallel
>>       - the collector is *concurrent*
>>       - the collector is unordered OR the stream is unordered
>>
>
> This is the current rule about whether or not collect() does a concurrent
> reduction.  My question here is whether we wish to make our existing
> concurrent collectors always be unordered, so that the last bullet is
> trivially satisfied for the built-in concurrent collectors.
>
>
>  In other words, *if* I happen to use groupingByConcurrent *then* maybe a
>> concurrent collection will be performed, but maybe not, depending on a
>> couple other factors...
>>
>
> That is the current state of affairs.
>
>
>  Can we make this simpler and more intuitive/predictable?  I realize
>> that's what you're addressing now, but can't we go a lot farther?
>>
>> Can we, say, get rid of groupingByConcurrent and just assume that if the
>> stream is parallel?  What do we lose?  Do we lose any functionality that
>> can't be derived another way?
>>
>
> That would cause us to access a non-thread-safe HashMap concurrently from
> multiple threads.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130408/71807564/attachment-0001.html