Encounter order

Wed Oct 24 06:40:01 PDT 2012

[2, 4, 6] is my expectation calculated in parallel or not.

Sam

All my photos are panoramas.

On Oct 23, 2012, at 5:01 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> OK, let me try this a different way.
>
> Let's separate *result* from *side effects*.  Parallelism can change the timing of side-effects, but (I argue) should not change the result of a computation (e.g., summing integers in parallel should yield the same result as summing them sequentially (modulo overflow.))  For better or worse, we're used to a world where too much is done via side-effects, so perhaps that's where all of this "no ordering" assumption is coming from?  If so, let's ignore side effects for a second, since many calculations can be expressed purely functionally, such as :
>
>  [ 1, 2, 3 ].map( x -> x*2 )
>
> Is there anyone claiming the answer is NOT required to be
>
>  [ 2, 4, 6 ]
>
> ?  Because this is what I'm hearing when I hear people (three of them, now) say "encounter order should be ignored."  (Alternately, this claim amounts to saying that list.parallel().map() should return a multiset rather than a list.)
>
> On 10/23/2012 2:17 PM, Joe Bowbeer wrote:
>> This is also my expectation/intuition regarding sorted().
>>
>> On Oct 23, 2012 11:06 AM, "Andrey Breslav" <andrey.breslav at jetbrains.com
>> <mailto:andrey.breslav at jetbrains.com>> wrote:
>>
>>    My first hunch is "of course parallel() has no order guarantees".
>>
>>    And my hunch is that sorted() should be a terminal operation (for a
>>    different reason than its interaction with parallel(), which I
>>    expressed earlier on the lambda-spec-experts list),
>>    so the examples presented so far do not change my view that much.
>>
>>    Which means I'm OK with having no order preserved after parallel.
>>
>>    But if preserving the order comes at a price so low that we can
>>    afford it (performance-wise and, more importantly,
>>    explanation-wise), why not?
>>
>>    So the question for me is only "how much does it cost?"
>>
>>    On Oct 23, 2012, at 17:43 , Brian Goetz wrote:
>>
>>     >> On the other hand I might be a little upset if I have to re-sort my
>>     >> billion element collection after filtering out the blue blocks ;-)
>>     >
>>     > Yes, this is the sort of user expectation I'm trying to make more
>>    precise.  It's easy to say "no order at all" (and this is fine for
>>    x.parallel().forEach(...)), but I think we do have expectations of
>>    order preservation, and I am trying to tease them out.
>>     >
>>     > Whether an encounter order is provided by the source or an
>>    intermediate stage should not matter.  So, if you expect:
>>     >
>>     >  list.parallel().sort().toArray()
>>     >
>>     > to result in a sorted array, then I think you should also expect
>>     >
>>     >  sortedSet.parallel().toArray()
>>     >
>>     > to result in a sorted array.
>>     >
>>     > Similarly, if you expect the first one to be sorted, I'm pretty
>>    sure you expect this to be sorted too:
>>     >
>>     >  list.sorted().filter(x -> x.color != BLUE).toArray()
>>     >
>>     > Which means I think you expect filter to be order-preserving.
>>     >
>>     >
>>     > Similarly, take reduce.  Typically a reducing function is
>>    expected to be associative but not commutative.  But that places a
>>    constraint on order; we can't just feed them to the reducer in
>>    random order.  And I think there is no getting around this one --
>>    requiring reducing functions be commutative is too strong a
>>    requirement.  So there's at least one example where we absolutely
>>    must pay attention to order.  (The cost of preserving order here is
>>    a little extra bookkeeping in the decomposition; we have to keep
>>    track of the order of a given node's children (e.g., left and right
>>    child pointers), so the children's results can be combined properly.)
>>     >
>>     >> Is it feasible to identify and implement order-preserving operations
>>     >> without either sacrificing direct performance (to do the op in
>>    an order
>>     >> preserving way) or requiring an additional sort step?
>>     >
>>     > We of course want to minimize the cost of preserving order where
>>    needed.  And many ops have friendly parallel solutions that don't
>>    involve arbitrary sequencing or extra copying.  For example, we can
>>    parallelize list.parallel().map().toArray() perfectly (modulo some
>>    assumptions about splitting), where we decompose the input list in
>>    such a way that we know where in the output array each chunk is
>>    going to go, and in parallel, compute and write the mapped results
>>    to exactly the right place in the output array.  So order
>>    preservation is not always expensive.  Other ops have minor costs of
>>    preserving order (like the reduce example.)  Still others have
>>    bigger costs.
>>     >
>>     > But we don't want to pay for ordering where it is not wanted, and
>>    this includes operations for which ordering is not valuable.
>>
>>    --
>>    Andrey Breslav
>>    http://jetbrains.com
>>    Develop with pleasure!
>>
>>