Encounter order

Tue Oct 23 07:07:27 PDT 2012

Here's a doc on Scala's behavior here -- they go out of their way to 
preserve encounter order, and require only that reducers be associative:

http://docs.scala-lang.org/overviews/parallel-collections/overview.html

On 10/23/2012 9:43 AM, Brian Goetz wrote:
>> On the other hand I might be a little upset if I have to re-sort my
>> billion element collection after filtering out the blue blocks ;-)
>
> Yes, this is the sort of user expectation I'm trying to make more
> precise.  It's easy to say "no order at all" (and this is fine for
> x.parallel().forEach(...)), but I think we do have expectations of order
> preservation, and I am trying to tease them out.
>
> Whether an encounter order is provided by the source or an intermediate
> stage should not matter.  So, if you expect:
>
>    list.parallel().sort().toArray()
>
> to result in a sorted array, then I think you should also expect
>
>    sortedSet.parallel().toArray()
>
> to result in a sorted array.
>
> Similarly, if you expect the first one to be sorted, I'm pretty sure you
> expect this to be sorted too:
>
>    list.sorted().filter(x -> x.color != BLUE).toArray()
>
> Which means I think you expect filter to be order-preserving.
>
>
> Similarly, take reduce.  Typically a reducing function is expected to be
> associative but not commutative.  But that places a constraint on order;
> we can't just feed them to the reducer in random order.  And I think
> there is no getting around this one -- requiring reducing functions be
> commutative is too strong a requirement.  So there's at least one
> example where we absolutely must pay attention to order.  (The cost of
> preserving order here is a little extra bookkeeping in the
> decomposition; we have to keep track of the order of a given node's
> children (e.g., left and right child pointers), so the children's
> results can be combined properly.)
>
>> Is it feasible to identify and implement order-preserving operations
>> without either sacrificing direct performance (to do the op in an order
>> preserving way) or requiring an additional sort step?
>
> We of course want to minimize the cost of preserving order where needed.
>   And many ops have friendly parallel solutions that don't involve
> arbitrary sequencing or extra copying.  For example, we can parallelize
> list.parallel().map().toArray() perfectly (modulo some assumptions about
> splitting), where we decompose the input list in such a way that we know
> where in the output array each chunk is going to go, and in parallel,
> compute and write the mapped results to exactly the right place in the
> output array.  So order preservation is not always expensive.  Other ops
> have minor costs of preserving order (like the reduce example.)  Still
> others have bigger costs.
>
> But we don't want to pay for ordering where it is not wanted, and this
> includes operations for which ordering is not valuable.