Encounter order
Brian Goetz
brian.goetz at oracle.com
Mon Oct 22 22:24:13 PDT 2012
> I don't have any assumptions about ordering after .parallel().
>
> list.parallel().toArray() is analogous to a parallel-ized for-loop.
> (See Parallel.ForEach in C#.)
Certainly, list.parallel().forEach() is analogous to a parallelized
for-loop.
> If I wanted it to execute in order, I wouldn't parallel() it.
But I didn't anything about *executing* in order. The question was, to
what extent should order be preserved.
For example, array.parallel().map().toArray() parallelizes perfectly
while preserving encounter order, though does not execute in order.
> Or, if I wanted to "preserve" the order, I would zip-with-index before
> parallel execution.
Does that mean you assume that reducing functions passed to reduce are
necessarily commutative?
What about
array.parallel().sorted().toArray()
? Would you expect the result to appear in the array in sorted order?
>
> --Joe
>
> On Mon, Oct 22, 2012 at 9:05 PM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
> For parallel calculations, there are several different things we
> might mean by order. In the context of bulk operations, I'll (try
> to) use these terms:
>
> - Encounter order. A source is said to have an encounter order
> when the order in which elements are yielded by iterator / forEach
> is predictable and meaningful. (This is a generalization of spatial
> order, since some sources may be purely generative rather than
> representing data structures.)
>
> Sources such as arrays, lists, queues, sorted collections, and IO
> channels have an encounter order; sources like HashSets or the key
> set of a HashMap do not (though their implementation may have a
> predictable iteration order anyway.)
>
> - Arrival order. Arrival order (aka temporal order) is the time
> order in which elements may arrive at a stage of an operation
> pipeline. In sequential calculations, the encounter order is
> generally the arrival order, but in parallel calculations may not
> be. Sometimes this is good, sometimes not.
>
> It is easy to track (via the stream flags mechanism) whether a
> stream source has (or more precisely, supports) an encounter order,
> and similarly easy to track whether an operation preserves that
> order. We also don't start any processing until we've seen the
> whole pipeline.
>
> Some operations have intrinsic semantic constraints that require us
> to produce a result that is consistent with processing the elements
> in encounter order. For example, a fold/reduce operation across a
> list using an associative but not commutative operation. (Though
> this can still be parallelized efficiently.) Others should probably
> be constrained to preserve order just because this is consistent
> with user expectations (e.g., applying the map function x -> x*2 to
> [ 1, 2, 3 ] should probably yield [ 2, 4, 6 ], not [ 4, 6, 2 ].)
> Other operations, such as forEach, may have neither constraint.
>
> Like any other constraint, requiring processing to be done
> consistently with encounter order has a cost.
>
> Finally, some operations have targets (like into or toArray), which
> might themselves support or not support an encounter order.
>
> What I am trying to accomplish here is to identify reasonable user
> expectations about order preservation for the (source, operation,
> target) combinations we have, and whether we want to / need to add
> additional operations to give users more control over ordering.
>
> Here are some starter questions. Let "list" contain the numbers
> 1..10. What should the following expressions compute?
>
> list.parallel().toArray()
>
> list.parallel().filter(e -> e > 5).toArray();
>
> list.parallel().into(new ArrayList<>());
>
> List target = new ArrayList();
> list.parallel().forEach(e -> target.add(e));
>
> I think the first one "obviously" should result in an array [ 1..10
> ]. And the last should "obviously" not be constrained at all as to
> order, only that it print the numbers 1 through 10 in some order.
> The third one, though, isn't completely obvious what the right
> answer is.
>
> What about this:
>
> Map m = list.parallel().groupBy(e -> e % 2 == 0);
>
> This partitions the list into two parts; should the elements in each
> part be in the encounter order from the original list? (In other
> words, should this result in { 0 -> [ 2, 4, 6, 8, 10 ], 1 -> [ 1, 3,
> 5, 7, 9 ] } ?)
>
> The model I'm targeting is: IF the source has an encounter order AND
> the operation must preserve encounter order, then we need to be
> sensitive to encounter order, otherwise anything goes. Is this the
> right model? If so, what are the preservation settings for each
> operation?
>
> (A secondary consideration is that there are often situations where
> the computation structurally suggests an encounter order, but the
> user doesn't actually care, and then is paying for ordering he
> doesn't want. But there are fixes for this.)
>
> Lots of more detailed questions but I'll let people respond to these.
>
>
More information about the lambda-libs-spec-observers
mailing list