Encounter order

Brian Goetz brian.goetz at oracle.com
Tue Oct 23 06:43:04 PDT 2012


> On the other hand I might be a little upset if I have to re-sort my
> billion element collection after filtering out the blue blocks ;-)

Yes, this is the sort of user expectation I'm trying to make more 
precise.  It's easy to say "no order at all" (and this is fine for 
x.parallel().forEach(...)), but I think we do have expectations of order 
preservation, and I am trying to tease them out.

Whether an encounter order is provided by the source or an intermediate 
stage should not matter.  So, if you expect:

   list.parallel().sort().toArray()

to result in a sorted array, then I think you should also expect

   sortedSet.parallel().toArray()

to result in a sorted array.

Similarly, if you expect the first one to be sorted, I'm pretty sure you 
expect this to be sorted too:

   list.sorted().filter(x -> x.color != BLUE).toArray()

Which means I think you expect filter to be order-preserving.


Similarly, take reduce.  Typically a reducing function is expected to be 
associative but not commutative.  But that places a constraint on order; 
we can't just feed them to the reducer in random order.  And I think 
there is no getting around this one -- requiring reducing functions be 
commutative is too strong a requirement.  So there's at least one 
example where we absolutely must pay attention to order.  (The cost of 
preserving order here is a little extra bookkeeping in the 
decomposition; we have to keep track of the order of a given node's 
children (e.g., left and right child pointers), so the children's 
results can be combined properly.)

> Is it feasible to identify and implement order-preserving operations
> without either sacrificing direct performance (to do the op in an order
> preserving way) or requiring an additional sort step?

We of course want to minimize the cost of preserving order where needed. 
  And many ops have friendly parallel solutions that don't involve 
arbitrary sequencing or extra copying.  For example, we can parallelize 
list.parallel().map().toArray() perfectly (modulo some assumptions about 
splitting), where we decompose the input list in such a way that we know 
where in the output array each chunk is going to go, and in parallel, 
compute and write the mapped results to exactly the right place in the 
output array.  So order preservation is not always expensive.  Other ops 
have minor costs of preserving order (like the reduce example.)  Still 
others have bigger costs.

But we don't want to pay for ordering where it is not wanted, and this 
includes operations for which ordering is not valuable.


More information about the lambda-libs-spec-experts mailing list