Encounter order
Brian Goetz
brian.goetz at oracle.com
Mon Oct 22 21:05:17 PDT 2012
For parallel calculations, there are several different things we might
mean by order. In the context of bulk operations, I'll (try to) use
these terms:
- Encounter order. A source is said to have an encounter order when
the order in which elements are yielded by iterator / forEach is
predictable and meaningful. (This is a generalization of spatial order,
since some sources may be purely generative rather than representing
data structures.)
Sources such as arrays, lists, queues, sorted collections, and IO
channels have an encounter order; sources like HashSets or the key set
of a HashMap do not (though their implementation may have a predictable
iteration order anyway.)
- Arrival order. Arrival order (aka temporal order) is the time order
in which elements may arrive at a stage of an operation pipeline. In
sequential calculations, the encounter order is generally the arrival
order, but in parallel calculations may not be. Sometimes this is good,
sometimes not.
It is easy to track (via the stream flags mechanism) whether a stream
source has (or more precisely, supports) an encounter order, and
similarly easy to track whether an operation preserves that order. We
also don't start any processing until we've seen the whole pipeline.
Some operations have intrinsic semantic constraints that require us to
produce a result that is consistent with processing the elements in
encounter order. For example, a fold/reduce operation across a list
using an associative but not commutative operation. (Though this can
still be parallelized efficiently.) Others should probably be
constrained to preserve order just because this is consistent with user
expectations (e.g., applying the map function x -> x*2 to [ 1, 2, 3 ]
should probably yield [ 2, 4, 6 ], not [ 4, 6, 2 ].) Other operations,
such as forEach, may have neither constraint.
Like any other constraint, requiring processing to be done consistently
with encounter order has a cost.
Finally, some operations have targets (like into or toArray), which
might themselves support or not support an encounter order.
What I am trying to accomplish here is to identify reasonable user
expectations about order preservation for the (source, operation,
target) combinations we have, and whether we want to / need to add
additional operations to give users more control over ordering.
Here are some starter questions. Let "list" contain the numbers 1..10.
What should the following expressions compute?
list.parallel().toArray()
list.parallel().filter(e -> e > 5).toArray();
list.parallel().into(new ArrayList<>());
List target = new ArrayList();
list.parallel().forEach(e -> target.add(e));
I think the first one "obviously" should result in an array [ 1..10 ].
And the last should "obviously" not be constrained at all as to order,
only that it print the numbers 1 through 10 in some order. The third
one, though, isn't completely obvious what the right answer is.
What about this:
Map m = list.parallel().groupBy(e -> e % 2 == 0);
This partitions the list into two parts; should the elements in each
part be in the encounter order from the original list? (In other words,
should this result in { 0 -> [ 2, 4, 6, 8, 10 ], 1 -> [ 1, 3, 5, 7, 9 ]
} ?)
The model I'm targeting is: IF the source has an encounter order AND the
operation must preserve encounter order, then we need to be sensitive to
encounter order, otherwise anything goes. Is this the right model? If
so, what are the preservation settings for each operation?
(A secondary consideration is that there are often situations where the
computation structurally suggests an encounter order, but the user
doesn't actually care, and then is paying for ordering he doesn't want.
But there are fixes for this.)
Lots of more detailed questions but I'll let people respond to these.
More information about the lambda-libs-spec-experts
mailing list