Encounter order

Dan Smith daniel.smith at oracle.com
Wed Oct 24 15:04:54 PDT 2012


On Oct 24, 2012, at 3:12 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Right.  While we could choose between Kind A and Kind B, I think choosing kind B is kind of silly.  As you say, it already alters the semantics and would call for a different API -- results automatically become multisets.  Since Java doesn't even have a multiset type, this will be foreign -- and also (IMO) not very useful.

I think we're on the same page, but just to make sure my point is clear: choosing between Kind A and Kind B is fine; choosing both is also fine; what I would consider bad are:
- Mixing Kind A and Kind B behavior into a single type (e.g., preserving order for 'map' but not 'into')
- Making Kind B a Stream

I'm not suggesting modeling Kind B with a new full-featured Collection -- just a ParallelStream interface that is unrelated to Stream.

> The general idea here is that parallelism is an optimization, which can affect side-effects, but not the result.  (This is the choice that Scala, Clojure, Haskell, Mathematica, Fortress, and others have taken.)
> 
> Since Java does so much with side-effects, sometimes it is hard to separate the result from the effects, but we should try anyway.  Here's an example:
> 
>  stream.map(x -> { System.out.println(x); return x; })
>        .toArray();
> 
> Here, I would say that in either a serial or parallel stream, this should result in a copy of the stream source, but in parallel, the ordering (and more) of the side-effects (println) are unpredictable. (Operations like forEach are all side-effect.)
> 
> What this means is that the question of "what should parallel toArray do" isn't the right question; the question is "what should toArray do".

For Kind A, yes, absolutely.

> For example, one question we could ask is "what happens when you call reduce on a stream without an encounter order."  One possibility is "hope that the reducer is commutative"; another is to treat reduction on a order-less stream as an error.  Unfortunately we don't have ways in the type system to tag lambdas as commutative or associative, so the user is on their own to avoid GIGO here.  This isn't great, as it requires some nonmodular reasoning on the part of the user.  On the other hand, this isn't all that different from having to know "is this collection thread-safe or not."

So this is a Kind B stream.  I'm saying an error would never be what we'd want -- since this is a new type unrelated to Stream, if we don't want to support an operation, it simply doesn't exist.

On commutativity: 1) Kind A streams require associativity -- ultimately, the same problem. 2) Our functional interface story says that the type _can_ encode properties like this.  They're just informal, so the compiler can't enforce them.  (I'm not just being pedantic -- informal contracts are an important part of the language.)  The user is responsible for providing a valid AssociativeOperator when that's what is asked for -- hence, GIGO.  But I do think it's important that properties like this be expressed with the type, so that we're as clear as possible to clients about how the client's operation needs to behave (e.g., the IDE says "give me an AssociativeOperator", not "give me a binary function").

—Dan

> On 10/24/2012 4:23 PM, Dan Smith wrote:
>> On Oct 23, 2012, at 2:45 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>> 
>>> OK, let me try this a different way.
>>> 
>>> Let's separate *result* from *side effects*.  Parallelism can change the timing of side-effects, but (I argue) should not change the result of a computation (e.g., summing integers in parallel should yield the same result as summing them sequentially (modulo overflow.))
>> 
>> Sounds like this discussion is mixing two different kinds of parallel streams.
>> 
>> Parallel stream Kind A is a Stream that produces identical results to a serial stream, but is allowed have out-of-order and out-of-thread side effects.
>> 
>> Parallel stream Kind B is a different entity that makes no guarantees about iteration order.  This is not a Stream at all, and should be represented with a different interface; many operations in Stream should not exist in this interface, and those that remain will likely have different contracts (e.g., 'reduce' requires a commutative operator).
>> 
>> (Aside 1: Sorry I don't have better names. :-))
>> (Aside 2: I think the decision to merge serial and parallel streams under one interface was made with Kind A in mind.)
>> 
>> The question is, I think, which kind of parallel stream we're really talking about when we say we want "parallel collections."  It's possible the answer is "both," and if that's the case, I think they should be implemented separately, rather than trying to mesh the two into one.  It's also possible that the performance gains enabled by Kind B don't justify its existence.
>> 
>>>  [ 1, 2, 3 ].map( x -> x*2 )
>>> 
>>> Is there anyone claiming the answer is NOT required to be
>>> 
>>>  [ 2, 4, 6 ]
>> 
>> You mean '[ 1, 2, 3 ].parallel().map( x -> x*2 )'.  For Kind B, this means '{ 1, 2, 3 }.map( x -> x*2 )' (using braces to suggest a set).
>> 
>>> (Alternately, this claim amounts to saying that list.parallel().map() should return a multiset rather than a list.)
>> 
>> Even more fundamentally, a Kind B stream should be modeled so that 'list.parallel()' is already a multiset, before 'map' is called.
>> 
>> —Dan
>> 



More information about the lambda-libs-spec-experts mailing list