Setting of UNORDERED on concurrent collectors

Mon Apr 8 13:41:49 PDT 2013

>
> In other words, as long as you can hold relevant state constant for the
> duration of your query, you get all this parallelism for free without
> having to think about thread safety or use thread-safe collections.

I'm using the forms of collect that hide the collections completely (except
as a return type).  I was only thinking about the order vs unorder and
parallel vs sequential aspects -- and I'd prefer to keep it that way.  So,
for example:

collect(unordered+parallel) should perform a concurrent collection?

(But you've already indicated that yes I do, in addition, need to think
about the collection type in this case even if I don't handle the
construction, right?)

Whereas your question is:

collectConcurrent(ordered+parallel) should disregard order?

I'm OK with this, but I wish groupingByConcurrent could go away.

--Joe

On Mon, Apr 8, 2013 at 1:19 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> I assumed that a concurrent collection would use a concurrent map.
>>   Isn't  it reasonable to assume that operations on a parallel stream
>> will use thread-safe collections?
>>
>
> ABSOLUTELY NOT!
>
> Any non-thread-safe collection can be used as a source for a parallel
> stream, without any more synchronization than is already implicit in the FJ
> library.  (Some may partition better than others, though; linked lists are
> never going to be parallel screamers.)
>
> Similarly, any reduction can be done in parallel into a non-thread-safe
> collection.  Many of our collectors use non-thread-safe result containers
> like ArrayList, StringBuilder, or HashMap but are still perfectly
> parallel-safe.  The library provides the necessary isolation, so that these
> non-thread-safe containers are serially thread-confined and still we can
> get decent parallelism.
>
> The only thing the user has to be careful of in order to not undermine
> this wonderful gift is to avoid interference.  Interference includes things
> like:
>  - Modifying the source while you're doing a stream operation on it.
>  - Using "lambdas" that depend on state that might be modified during the
> course of the stream operation.
>
> In other words, as long as you can hold relevant state constant for the
> duration of your query, you get all this parallelism for free without
> having to think about thread safety or use thread-safe collections.
> Effective immutability is a very powerful thing.
>
>
>  BTW, the other downside of the current state of affairs is experienced
>> by the user who specifies a parallel stream and even declares it
>> unordered, but still gets a non-concurrent collection because groupingBy
>> was used instead of groupingByConcurrent.
>>
>
> Right.  But he will still get a parallel reduction.  It just may be that
> in some cases, he gets a reduction that parallelizes poorly, because the
> combine step of the reduction happens to be way more expensive that the
> accumulate step, as it is when the combine step is a merge-maps-by-key.
>  (We have no way of knowing this a priori.  Some non-concurrent reductions
> will parallelize with fine performance and have no need of the additional
> benefit that a concurrent collection gives.)
>
>
>  In your examples, the difference between the two results is primarily
>> one of order, not concurrency.  Can we reflect this choice more directly
>> in the API?
>>
>
> We used to have that -- the selection of ordering (collect vs
> collectUnordered) was orthogonal to the collector, and we did a concurrent
> collection if we were in the (unordered, concurrent) quadrant.  That's the
> most explicit.
>
>
>