Encounter order
Sam Pullara
spullara at gmail.com
Wed Oct 24 06:40:01 PDT 2012
[2, 4, 6] is my expectation calculated in parallel or not.
Sam
All my photos are panoramas.
On Oct 23, 2012, at 5:01 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> OK, let me try this a different way.
>
> Let's separate *result* from *side effects*. Parallelism can change the timing of side-effects, but (I argue) should not change the result of a computation (e.g., summing integers in parallel should yield the same result as summing them sequentially (modulo overflow.)) For better or worse, we're used to a world where too much is done via side-effects, so perhaps that's where all of this "no ordering" assumption is coming from? If so, let's ignore side effects for a second, since many calculations can be expressed purely functionally, such as :
>
> [ 1, 2, 3 ].map( x -> x*2 )
>
> Is there anyone claiming the answer is NOT required to be
>
> [ 2, 4, 6 ]
>
> ? Because this is what I'm hearing when I hear people (three of them, now) say "encounter order should be ignored." (Alternately, this claim amounts to saying that list.parallel().map() should return a multiset rather than a list.)
>
> On 10/23/2012 2:17 PM, Joe Bowbeer wrote:
>> This is also my expectation/intuition regarding sorted().
>>
>> On Oct 23, 2012 11:06 AM, "Andrey Breslav" <andrey.breslav at jetbrains.com
>> <mailto:andrey.breslav at jetbrains.com>> wrote:
>>
>> My first hunch is "of course parallel() has no order guarantees".
>>
>> And my hunch is that sorted() should be a terminal operation (for a
>> different reason than its interaction with parallel(), which I
>> expressed earlier on the lambda-spec-experts list),
>> so the examples presented so far do not change my view that much.
>>
>> Which means I'm OK with having no order preserved after parallel.
>>
>> But if preserving the order comes at a price so low that we can
>> afford it (performance-wise and, more importantly,
>> explanation-wise), why not?
>>
>> So the question for me is only "how much does it cost?"
>>
>> On Oct 23, 2012, at 17:43 , Brian Goetz wrote:
>>
>> >> On the other hand I might be a little upset if I have to re-sort my
>> >> billion element collection after filtering out the blue blocks ;-)
>> >
>> > Yes, this is the sort of user expectation I'm trying to make more
>> precise. It's easy to say "no order at all" (and this is fine for
>> x.parallel().forEach(...)), but I think we do have expectations of
>> order preservation, and I am trying to tease them out.
>> >
>> > Whether an encounter order is provided by the source or an
>> intermediate stage should not matter. So, if you expect:
>> >
>> > list.parallel().sort().toArray()
>> >
>> > to result in a sorted array, then I think you should also expect
>> >
>> > sortedSet.parallel().toArray()
>> >
>> > to result in a sorted array.
>> >
>> > Similarly, if you expect the first one to be sorted, I'm pretty
>> sure you expect this to be sorted too:
>> >
>> > list.sorted().filter(x -> x.color != BLUE).toArray()
>> >
>> > Which means I think you expect filter to be order-preserving.
>> >
>> >
>> > Similarly, take reduce. Typically a reducing function is
>> expected to be associative but not commutative. But that places a
>> constraint on order; we can't just feed them to the reducer in
>> random order. And I think there is no getting around this one --
>> requiring reducing functions be commutative is too strong a
>> requirement. So there's at least one example where we absolutely
>> must pay attention to order. (The cost of preserving order here is
>> a little extra bookkeeping in the decomposition; we have to keep
>> track of the order of a given node's children (e.g., left and right
>> child pointers), so the children's results can be combined properly.)
>> >
>> >> Is it feasible to identify and implement order-preserving operations
>> >> without either sacrificing direct performance (to do the op in
>> an order
>> >> preserving way) or requiring an additional sort step?
>> >
>> > We of course want to minimize the cost of preserving order where
>> needed. And many ops have friendly parallel solutions that don't
>> involve arbitrary sequencing or extra copying. For example, we can
>> parallelize list.parallel().map().toArray() perfectly (modulo some
>> assumptions about splitting), where we decompose the input list in
>> such a way that we know where in the output array each chunk is
>> going to go, and in parallel, compute and write the mapped results
>> to exactly the right place in the output array. So order
>> preservation is not always expensive. Other ops have minor costs of
>> preserving order (like the reduce example.) Still others have
>> bigger costs.
>> >
>> > But we don't want to pay for ordering where it is not wanted, and
>> this includes operations for which ordering is not valuable.
>>
>> --
>> Andrey Breslav
>> http://jetbrains.com
>> Develop with pleasure!
>>
>>
More information about the lambda-libs-spec-observers
mailing list