Encounter order
Brian Goetz
brian.goetz at oracle.com
Tue Oct 23 13:45:56 PDT 2012
OK, let me try this a different way.
Let's separate *result* from *side effects*. Parallelism can change the
timing of side-effects, but (I argue) should not change the result of a
computation (e.g., summing integers in parallel should yield the same
result as summing them sequentially (modulo overflow.)) For better or
worse, we're used to a world where too much is done via side-effects, so
perhaps that's where all of this "no ordering" assumption is coming
from? If so, let's ignore side effects for a second, since many
calculations can be expressed purely functionally, such as :
[ 1, 2, 3 ].map( x -> x*2 )
Is there anyone claiming the answer is NOT required to be
[ 2, 4, 6 ]
? Because this is what I'm hearing when I hear people (three of them,
now) say "encounter order should be ignored." (Alternately, this claim
amounts to saying that list.parallel().map() should return a multiset
rather than a list.)
On 10/23/2012 2:17 PM, Joe Bowbeer wrote:
> This is also my expectation/intuition regarding sorted().
>
> On Oct 23, 2012 11:06 AM, "Andrey Breslav" <andrey.breslav at jetbrains.com
> <mailto:andrey.breslav at jetbrains.com>> wrote:
>
> My first hunch is "of course parallel() has no order guarantees".
>
> And my hunch is that sorted() should be a terminal operation (for a
> different reason than its interaction with parallel(), which I
> expressed earlier on the lambda-spec-experts list),
> so the examples presented so far do not change my view that much.
>
> Which means I'm OK with having no order preserved after parallel.
>
> But if preserving the order comes at a price so low that we can
> afford it (performance-wise and, more importantly,
> explanation-wise), why not?
>
> So the question for me is only "how much does it cost?"
>
> On Oct 23, 2012, at 17:43 , Brian Goetz wrote:
>
> >> On the other hand I might be a little upset if I have to re-sort my
> >> billion element collection after filtering out the blue blocks ;-)
> >
> > Yes, this is the sort of user expectation I'm trying to make more
> precise. It's easy to say "no order at all" (and this is fine for
> x.parallel().forEach(...)), but I think we do have expectations of
> order preservation, and I am trying to tease them out.
> >
> > Whether an encounter order is provided by the source or an
> intermediate stage should not matter. So, if you expect:
> >
> > list.parallel().sort().toArray()
> >
> > to result in a sorted array, then I think you should also expect
> >
> > sortedSet.parallel().toArray()
> >
> > to result in a sorted array.
> >
> > Similarly, if you expect the first one to be sorted, I'm pretty
> sure you expect this to be sorted too:
> >
> > list.sorted().filter(x -> x.color != BLUE).toArray()
> >
> > Which means I think you expect filter to be order-preserving.
> >
> >
> > Similarly, take reduce. Typically a reducing function is
> expected to be associative but not commutative. But that places a
> constraint on order; we can't just feed them to the reducer in
> random order. And I think there is no getting around this one --
> requiring reducing functions be commutative is too strong a
> requirement. So there's at least one example where we absolutely
> must pay attention to order. (The cost of preserving order here is
> a little extra bookkeeping in the decomposition; we have to keep
> track of the order of a given node's children (e.g., left and right
> child pointers), so the children's results can be combined properly.)
> >
> >> Is it feasible to identify and implement order-preserving operations
> >> without either sacrificing direct performance (to do the op in
> an order
> >> preserving way) or requiring an additional sort step?
> >
> > We of course want to minimize the cost of preserving order where
> needed. And many ops have friendly parallel solutions that don't
> involve arbitrary sequencing or extra copying. For example, we can
> parallelize list.parallel().map().toArray() perfectly (modulo some
> assumptions about splitting), where we decompose the input list in
> such a way that we know where in the output array each chunk is
> going to go, and in parallel, compute and write the mapped results
> to exactly the right place in the output array. So order
> preservation is not always expensive. Other ops have minor costs of
> preserving order (like the reduce example.) Still others have
> bigger costs.
> >
> > But we don't want to pay for ordering where it is not wanted, and
> this includes operations for which ordering is not valuable.
>
> --
> Andrey Breslav
> http://jetbrains.com
> Develop with pleasure!
>
>
More information about the lambda-libs-spec-experts
mailing list