Encounter order
Brian Goetz
brian.goetz at oracle.com
Wed Oct 24 14:12:19 PDT 2012
Right. While we could choose between Kind A and Kind B, I think
choosing kind B is kind of silly. As you say, it already alters the
semantics and would call for a different API -- results automatically
become multisets. Since Java doesn't even have a multiset type, this
will be foreign -- and also (IMO) not very useful.
The general idea here is that parallelism is an optimization, which can
affect side-effects, but not the result. (This is the choice that
Scala, Clojure, Haskell, Mathematica, Fortress, and others have taken.)
Since Java does so much with side-effects, sometimes it is hard to
separate the result from the effects, but we should try anyway. Here's
an example:
stream.map(x -> { System.out.println(x); return x; })
.toArray();
Here, I would say that in either a serial or parallel stream, this
should result in a copy of the stream source, but in parallel, the
ordering (and more) of the side-effects (println) are unpredictable.
(Operations like forEach are all side-effect.)
What this means is that the question of "what should parallel toArray
do" isn't the right question; the question is "what should toArray do".
For example, one question we could ask is "what happens when you call
reduce on a stream without an encounter order." One possibility is
"hope that the reducer is commutative"; another is to treat reduction on
a order-less stream as an error. Unfortunately we don't have ways in
the type system to tag lambdas as commutative or associative, so the
user is on their own to avoid GIGO here. This isn't great, as it
requires some nonmodular reasoning on the part of the user. On the
other hand, this isn't all that different from having to know "is this
collection thread-safe or not."
On 10/24/2012 4:23 PM, Dan Smith wrote:
> On Oct 23, 2012, at 2:45 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>
>> OK, let me try this a different way.
>>
>> Let's separate *result* from *side effects*. Parallelism can change the timing of side-effects, but (I argue) should not change the result of a computation (e.g., summing integers in parallel should yield the same result as summing them sequentially (modulo overflow.))
>
> Sounds like this discussion is mixing two different kinds of parallel streams.
>
> Parallel stream Kind A is a Stream that produces identical results to a serial stream, but is allowed have out-of-order and out-of-thread side effects.
>
> Parallel stream Kind B is a different entity that makes no guarantees about iteration order. This is not a Stream at all, and should be represented with a different interface; many operations in Stream should not exist in this interface, and those that remain will likely have different contracts (e.g., 'reduce' requires a commutative operator).
>
> (Aside 1: Sorry I don't have better names. :-))
> (Aside 2: I think the decision to merge serial and parallel streams under one interface was made with Kind A in mind.)
>
> The question is, I think, which kind of parallel stream we're really talking about when we say we want "parallel collections." It's possible the answer is "both," and if that's the case, I think they should be implemented separately, rather than trying to mesh the two into one. It's also possible that the performance gains enabled by Kind B don't justify its existence.
>
>> [ 1, 2, 3 ].map( x -> x*2 )
>>
>> Is there anyone claiming the answer is NOT required to be
>>
>> [ 2, 4, 6 ]
>
> You mean '[ 1, 2, 3 ].parallel().map( x -> x*2 )'. For Kind B, this means '{ 1, 2, 3 }.map( x -> x*2 )' (using braces to suggest a set).
>
>> (Alternately, this claim amounts to saying that list.parallel().map() should return a multiset rather than a list.)
>
> Even more fundamentally, a Kind B stream should be modeled so that 'list.parallel()' is already a multiset, before 'map' is called.
>
> —Dan
>
More information about the lambda-libs-spec-observers
mailing list