Setting of UNORDERED on concurrent collectors
Joe Bowbeer
joe.bowbeer at gmail.com
Mon Apr 8 13:41:49 PDT 2013
>
> In other words, as long as you can hold relevant state constant for the
> duration of your query, you get all this parallelism for free without
> having to think about thread safety or use thread-safe collections.
I'm using the forms of collect that hide the collections completely (except
as a return type). I was only thinking about the order vs unorder and
parallel vs sequential aspects -- and I'd prefer to keep it that way. So,
for example:
collect(unordered+parallel) should perform a concurrent collection?
(But you've already indicated that yes I do, in addition, need to think
about the collection type in this case even if I don't handle the
construction, right?)
Whereas your question is:
collectConcurrent(ordered+parallel) should disregard order?
I'm OK with this, but I wish groupingByConcurrent could go away.
--Joe
On Mon, Apr 8, 2013 at 1:19 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> I assumed that a concurrent collection would use a concurrent map.
>> Isn't it reasonable to assume that operations on a parallel stream
>> will use thread-safe collections?
>>
>
> ABSOLUTELY NOT!
>
> Any non-thread-safe collection can be used as a source for a parallel
> stream, without any more synchronization than is already implicit in the FJ
> library. (Some may partition better than others, though; linked lists are
> never going to be parallel screamers.)
>
> Similarly, any reduction can be done in parallel into a non-thread-safe
> collection. Many of our collectors use non-thread-safe result containers
> like ArrayList, StringBuilder, or HashMap but are still perfectly
> parallel-safe. The library provides the necessary isolation, so that these
> non-thread-safe containers are serially thread-confined and still we can
> get decent parallelism.
>
> The only thing the user has to be careful of in order to not undermine
> this wonderful gift is to avoid interference. Interference includes things
> like:
> - Modifying the source while you're doing a stream operation on it.
> - Using "lambdas" that depend on state that might be modified during the
> course of the stream operation.
>
> In other words, as long as you can hold relevant state constant for the
> duration of your query, you get all this parallelism for free without
> having to think about thread safety or use thread-safe collections.
> Effective immutability is a very powerful thing.
>
>
> BTW, the other downside of the current state of affairs is experienced
>> by the user who specifies a parallel stream and even declares it
>> unordered, but still gets a non-concurrent collection because groupingBy
>> was used instead of groupingByConcurrent.
>>
>
> Right. But he will still get a parallel reduction. It just may be that
> in some cases, he gets a reduction that parallelizes poorly, because the
> combine step of the reduction happens to be way more expensive that the
> accumulate step, as it is when the combine step is a merge-maps-by-key.
> (We have no way of knowing this a priori. Some non-concurrent reductions
> will parallelize with fine performance and have no need of the additional
> benefit that a concurrent collection gives.)
>
>
> In your examples, the difference between the two results is primarily
>> one of order, not concurrency. Can we reflect this choice more directly
>> in the API?
>>
>
> We used to have that -- the selection of ordering (collect vs
> collectUnordered) was orthogonal to the collector, and we did a concurrent
> collection if we were in the (unordered, concurrent) quadrant. That's the
> most explicit.
>
>
>
More information about the lambda-libs-spec-observers
mailing list