Setting of UNORDERED on concurrent collectors
Brian Goetz
brian.goetz at oracle.com
Mon Apr 8 13:19:16 PDT 2013
> I assumed that a concurrent collection would use a concurrent map.
> Isn't it reasonable to assume that operations on a parallel stream
> will use thread-safe collections?
ABSOLUTELY NOT!
Any non-thread-safe collection can be used as a source for a parallel
stream, without any more synchronization than is already implicit in the
FJ library. (Some may partition better than others, though; linked
lists are never going to be parallel screamers.)
Similarly, any reduction can be done in parallel into a non-thread-safe
collection. Many of our collectors use non-thread-safe result
containers like ArrayList, StringBuilder, or HashMap but are still
perfectly parallel-safe. The library provides the necessary isolation,
so that these non-thread-safe containers are serially thread-confined
and still we can get decent parallelism.
The only thing the user has to be careful of in order to not undermine
this wonderful gift is to avoid interference. Interference includes
things like:
- Modifying the source while you're doing a stream operation on it.
- Using "lambdas" that depend on state that might be modified during
the course of the stream operation.
In other words, as long as you can hold relevant state constant for the
duration of your query, you get all this parallelism for free without
having to think about thread safety or use thread-safe collections.
Effective immutability is a very powerful thing.
> BTW, the other downside of the current state of affairs is experienced
> by the user who specifies a parallel stream and even declares it
> unordered, but still gets a non-concurrent collection because groupingBy
> was used instead of groupingByConcurrent.
Right. But he will still get a parallel reduction. It just may be that
in some cases, he gets a reduction that parallelizes poorly, because the
combine step of the reduction happens to be way more expensive that the
accumulate step, as it is when the combine step is a merge-maps-by-key.
(We have no way of knowing this a priori. Some non-concurrent
reductions will parallelize with fine performance and have no need of
the additional benefit that a concurrent collection gives.)
> In your examples, the difference between the two results is primarily
> one of order, not concurrency. Can we reflect this choice more directly
> in the API?
We used to have that -- the selection of ordering (collect vs
collectUnordered) was orthogonal to the collector, and we did a
concurrent collection if we were in the (unordered, concurrent)
quadrant. That's the most explicit.
More information about the lambda-libs-spec-experts
mailing list