Setting of UNORDERED on concurrent collectors
Brian Goetz
brian.goetz at oracle.com
Tue Apr 16 12:47:32 PDT 2013
We never converged on this one. Here's another stab at framing the
problem. (I'm pretty much ready to time out and make these collectors
declare UNORDERED unless someone can convince me otherwise.)
Streams consist of source + intermediate ops + terminal.
Denote ordered/unordered variants of these as SO/SU, IO/IU/IA
(A=agnostic), and TA/TU. We can define the ordered-ness of any stream
pipeline as follows:
ordered(SO) = true
ordered(SU) = false
ordered(X+IO) = true
ordered(X+IU) = false
ordered(X+IA) = ordered(X)
ordered(X+TA) = ordered(X)
ordered(X+TU) = false
A concurrent calculation may be performed if the stream is unordered
*and* the destination is concurrent.
Collectors like toSet() are marked TU, and toList() are marked TA.
Collectors like groupingByConcurrent will definitely be marked
concurrent. Question is, should it be marked TA or TU? Either choice
is defensible.
Note that collectors individually get to choose whether they are TA or
TU. Choices we make for our canned collectors need not affect
user-written collectors. The model can handle both and users can
predict the behavior of both.
On 4/8/2013 3:08 PM, Brian Goetz wrote:
> Now that we've removed collectUnordered in favor of a more general
> unordered() op, we should consider what should be the default behavior for:
>
> orderedStream.collect(groupingByConcurrent(f))
>
> Currently, the collect-to-ConcurrentMap collectors are *not* defined as
> UNORDERED. Which means, if the stream is ordered, we will attempt to do
> an ordered collection anyway, which is incompatible with concurrent
> collection, and we will do the plain old partition-and-merge with
> ConcurrentMap.
>
> Here, we have competing evidence for the user intent. On the one hand,
> the stream is ordered, and the user could have chosen unordered. On the
> other, the user has asked for concurrent grouping. Its not 100% obvious
> which should win.
>
> On the other hand, ordered map collections are so awful that they will
> almost certainly be unhappy with the performance if they forget to say
> unordered here in the parallel case (and it makes no difference in the
> sequential case.) So I'm inclined to make groupingByConcurrent /
> toConcurrentMap be UNORDERED collections.
More information about the lambda-libs-spec-observers
mailing list