Encounter order
Brian Goetz
brian.goetz at oracle.com
Tue Oct 23 06:43:04 PDT 2012
> On the other hand I might be a little upset if I have to re-sort my
> billion element collection after filtering out the blue blocks ;-)
Yes, this is the sort of user expectation I'm trying to make more
precise. It's easy to say "no order at all" (and this is fine for
x.parallel().forEach(...)), but I think we do have expectations of order
preservation, and I am trying to tease them out.
Whether an encounter order is provided by the source or an intermediate
stage should not matter. So, if you expect:
list.parallel().sort().toArray()
to result in a sorted array, then I think you should also expect
sortedSet.parallel().toArray()
to result in a sorted array.
Similarly, if you expect the first one to be sorted, I'm pretty sure you
expect this to be sorted too:
list.sorted().filter(x -> x.color != BLUE).toArray()
Which means I think you expect filter to be order-preserving.
Similarly, take reduce. Typically a reducing function is expected to be
associative but not commutative. But that places a constraint on order;
we can't just feed them to the reducer in random order. And I think
there is no getting around this one -- requiring reducing functions be
commutative is too strong a requirement. So there's at least one
example where we absolutely must pay attention to order. (The cost of
preserving order here is a little extra bookkeeping in the
decomposition; we have to keep track of the order of a given node's
children (e.g., left and right child pointers), so the children's
results can be combined properly.)
> Is it feasible to identify and implement order-preserving operations
> without either sacrificing direct performance (to do the op in an order
> preserving way) or requiring an additional sort step?
We of course want to minimize the cost of preserving order where needed.
And many ops have friendly parallel solutions that don't involve
arbitrary sequencing or extra copying. For example, we can parallelize
list.parallel().map().toArray() perfectly (modulo some assumptions about
splitting), where we decompose the input list in such a way that we know
where in the output array each chunk is going to go, and in parallel,
compute and write the mapped results to exactly the right place in the
output array. So order preservation is not always expensive. Other ops
have minor costs of preserving order (like the reduce example.) Still
others have bigger costs.
But we don't want to pay for ordering where it is not wanted, and this
includes operations for which ordering is not valuable.
More information about the lambda-libs-spec-experts
mailing list