Setting of UNORDERED on concurrent collectors

Mon Apr 8 13:19:16 PDT 2013

> I assumed that a concurrent collection would use a concurrent map.
>   Isn't  it reasonable to assume that operations on a parallel stream
> will use thread-safe collections?

ABSOLUTELY NOT!

Any non-thread-safe collection can be used as a source for a parallel 
stream, without any more synchronization than is already implicit in the 
FJ library.  (Some may partition better than others, though; linked 
lists are never going to be parallel screamers.)

Similarly, any reduction can be done in parallel into a non-thread-safe 
collection.  Many of our collectors use non-thread-safe result 
containers like ArrayList, StringBuilder, or HashMap but are still 
perfectly parallel-safe.  The library provides the necessary isolation, 
so that these non-thread-safe containers are serially thread-confined 
and still we can get decent parallelism.

The only thing the user has to be careful of in order to not undermine 
this wonderful gift is to avoid interference.  Interference includes 
things like:
  - Modifying the source while you're doing a stream operation on it.
  - Using "lambdas" that depend on state that might be modified during 
the course of the stream operation.

In other words, as long as you can hold relevant state constant for the 
duration of your query, you get all this parallelism for free without 
having to think about thread safety or use thread-safe collections. 
Effective immutability is a very powerful thing.

> BTW, the other downside of the current state of affairs is experienced
> by the user who specifies a parallel stream and even declares it
> unordered, but still gets a non-concurrent collection because groupingBy
> was used instead of groupingByConcurrent.

Right.  But he will still get a parallel reduction.  It just may be that 
in some cases, he gets a reduction that parallelizes poorly, because the 
combine step of the reduction happens to be way more expensive that the 
accumulate step, as it is when the combine step is a merge-maps-by-key. 
  (We have no way of knowing this a priori.  Some non-concurrent 
reductions will parallelize with fine performance and have no need of 
the additional benefit that a concurrent collection gives.)

> In your examples, the difference between the two results is primarily
> one of order, not concurrency.  Can we reflect this choice more directly
> in the API?

We used to have that -- the selection of ordering (collect vs 
collectUnordered) was orthogonal to the collector, and we did a 
concurrent collection if we were in the (unordered, concurrent) 
quadrant.  That's the most explicit.