Encounter order

Tue Oct 23 13:45:56 PDT 2012

OK, let me try this a different way.

Let's separate *result* from *side effects*.  Parallelism can change the 
timing of side-effects, but (I argue) should not change the result of a 
computation (e.g., summing integers in parallel should yield the same 
result as summing them sequentially (modulo overflow.))  For better or 
worse, we're used to a world where too much is done via side-effects, so 
perhaps that's where all of this "no ordering" assumption is coming 
from?  If so, let's ignore side effects for a second, since many 
calculations can be expressed purely functionally, such as :

   [ 1, 2, 3 ].map( x -> x*2 )

Is there anyone claiming the answer is NOT required to be

   [ 2, 4, 6 ]

?  Because this is what I'm hearing when I hear people (three of them, 
now) say "encounter order should be ignored."  (Alternately, this claim 
amounts to saying that list.parallel().map() should return a multiset 
rather than a list.)

On 10/23/2012 2:17 PM, Joe Bowbeer wrote:
> This is also my expectation/intuition regarding sorted().
>
> On Oct 23, 2012 11:06 AM, "Andrey Breslav" <andrey.breslav at jetbrains.com
> <mailto:andrey.breslav at jetbrains.com>> wrote:
>
>     My first hunch is "of course parallel() has no order guarantees".
>
>     And my hunch is that sorted() should be a terminal operation (for a
>     different reason than its interaction with parallel(), which I
>     expressed earlier on the lambda-spec-experts list),
>     so the examples presented so far do not change my view that much.
>
>     Which means I'm OK with having no order preserved after parallel.
>
>     But if preserving the order comes at a price so low that we can
>     afford it (performance-wise and, more importantly,
>     explanation-wise), why not?
>
>     So the question for me is only "how much does it cost?"
>
>     On Oct 23, 2012, at 17:43 , Brian Goetz wrote:
>
>      >> On the other hand I might be a little upset if I have to re-sort my
>      >> billion element collection after filtering out the blue blocks ;-)
>      >
>      > Yes, this is the sort of user expectation I'm trying to make more
>     precise.  It's easy to say "no order at all" (and this is fine for
>     x.parallel().forEach(...)), but I think we do have expectations of
>     order preservation, and I am trying to tease them out.
>      >
>      > Whether an encounter order is provided by the source or an
>     intermediate stage should not matter.  So, if you expect:
>      >
>      >  list.parallel().sort().toArray()
>      >
>      > to result in a sorted array, then I think you should also expect
>      >
>      >  sortedSet.parallel().toArray()
>      >
>      > to result in a sorted array.
>      >
>      > Similarly, if you expect the first one to be sorted, I'm pretty
>     sure you expect this to be sorted too:
>      >
>      >  list.sorted().filter(x -> x.color != BLUE).toArray()
>      >
>      > Which means I think you expect filter to be order-preserving.
>      >
>      >
>      > Similarly, take reduce.  Typically a reducing function is
>     expected to be associative but not commutative.  But that places a
>     constraint on order; we can't just feed them to the reducer in
>     random order.  And I think there is no getting around this one --
>     requiring reducing functions be commutative is too strong a
>     requirement.  So there's at least one example where we absolutely
>     must pay attention to order.  (The cost of preserving order here is
>     a little extra bookkeeping in the decomposition; we have to keep
>     track of the order of a given node's children (e.g., left and right
>     child pointers), so the children's results can be combined properly.)
>      >
>      >> Is it feasible to identify and implement order-preserving operations
>      >> without either sacrificing direct performance (to do the op in
>     an order
>      >> preserving way) or requiring an additional sort step?
>      >
>      > We of course want to minimize the cost of preserving order where
>     needed.  And many ops have friendly parallel solutions that don't
>     involve arbitrary sequencing or extra copying.  For example, we can
>     parallelize list.parallel().map().toArray() perfectly (modulo some
>     assumptions about splitting), where we decompose the input list in
>     such a way that we know where in the output array each chunk is
>     going to go, and in parallel, compute and write the mapped results
>     to exactly the right place in the output array.  So order
>     preservation is not always expensive.  Other ops have minor costs of
>     preserving order (like the reduce example.)  Still others have
>     bigger costs.
>      >
>      > But we don't want to pay for ordering where it is not wanted, and
>     this includes operations for which ordering is not valuable.
>
>     --
>     Andrey Breslav
>     http://jetbrains.com
>     Develop with pleasure!
>
>