Encounter order

Tue Oct 23 11:06:01 PDT 2012

My first hunch is "of course parallel() has no order guarantees". 

And my hunch is that sorted() should be a terminal operation (for a different reason than its interaction with parallel(), which I expressed earlier on the lambda-spec-experts list),
so the examples presented so far do not change my view that much.

Which means I'm OK with having no order preserved after parallel.

But if preserving the order comes at a price so low that we can afford it (performance-wise and, more importantly, explanation-wise), why not?

So the question for me is only "how much does it cost?"

On Oct 23, 2012, at 17:43 , Brian Goetz wrote:

>> On the other hand I might be a little upset if I have to re-sort my
>> billion element collection after filtering out the blue blocks ;-)
> 
> Yes, this is the sort of user expectation I'm trying to make more precise.  It's easy to say "no order at all" (and this is fine for x.parallel().forEach(...)), but I think we do have expectations of order preservation, and I am trying to tease them out.
> 
> Whether an encounter order is provided by the source or an intermediate stage should not matter.  So, if you expect:
> 
>  list.parallel().sort().toArray()
> 
> to result in a sorted array, then I think you should also expect
> 
>  sortedSet.parallel().toArray()
> 
> to result in a sorted array.
> 
> Similarly, if you expect the first one to be sorted, I'm pretty sure you expect this to be sorted too:
> 
>  list.sorted().filter(x -> x.color != BLUE).toArray()
> 
> Which means I think you expect filter to be order-preserving.
> 
> 
> Similarly, take reduce.  Typically a reducing function is expected to be associative but not commutative.  But that places a constraint on order; we can't just feed them to the reducer in random order.  And I think there is no getting around this one -- requiring reducing functions be commutative is too strong a requirement.  So there's at least one example where we absolutely must pay attention to order.  (The cost of preserving order here is a little extra bookkeeping in the decomposition; we have to keep track of the order of a given node's children (e.g., left and right child pointers), so the children's results can be combined properly.)
> 
>> Is it feasible to identify and implement order-preserving operations
>> without either sacrificing direct performance (to do the op in an order
>> preserving way) or requiring an additional sort step?
> 
> We of course want to minimize the cost of preserving order where needed.  And many ops have friendly parallel solutions that don't involve arbitrary sequencing or extra copying.  For example, we can parallelize list.parallel().map().toArray() perfectly (modulo some assumptions about splitting), where we decompose the input list in such a way that we know where in the output array each chunk is going to go, and in parallel, compute and write the mapped results to exactly the right place in the output array.  So order preservation is not always expensive.  Other ops have minor costs of preserving order (like the reduce example.)  Still others have bigger costs.
> 
> But we don't want to pay for ordering where it is not wanted, and this includes operations for which ordering is not valuable.

--
Andrey Breslav
http://jetbrains.com
Develop with pleasure!