Encounter order

Mon Oct 22 21:05:17 PDT 2012

For parallel calculations, there are several different things we might 
mean by order.  In the context of bulk operations, I'll (try to) use 
these terms:

  - Encounter order.  A source is said to have an encounter order when 
the order in which elements are yielded by iterator / forEach is 
predictable and meaningful.  (This is a generalization of spatial order, 
since some sources may be purely generative rather than representing 
data structures.)

Sources such as arrays, lists, queues, sorted collections, and IO 
channels have an encounter order; sources like HashSets or the key set 
of a HashMap do not (though their implementation may have a predictable 
iteration order anyway.)

  - Arrival order.  Arrival order (aka temporal order) is the time order 
in which elements may arrive at a stage of an operation pipeline.  In 
sequential calculations, the encounter order is generally the arrival 
order, but in parallel calculations may not be.  Sometimes this is good, 
sometimes not.

It is easy to track (via the stream flags mechanism) whether a stream 
source has (or more precisely, supports) an encounter order, and 
similarly easy to track whether an operation preserves that order.  We 
also don't start any processing until we've seen the whole pipeline.

Some operations have intrinsic semantic constraints that require us to 
produce a result that is consistent with processing the elements in 
encounter order.  For example, a fold/reduce operation across a list 
using an associative but not commutative operation.  (Though this can 
still be parallelized efficiently.)  Others should probably be 
constrained to preserve order just because this is consistent with user 
expectations (e.g., applying the map function x -> x*2 to [ 1, 2, 3 ] 
should probably yield [ 2, 4, 6 ], not [ 4, 6, 2 ].) Other operations, 
such as forEach, may have neither constraint.

Like any other constraint, requiring processing to be done consistently 
with encounter order has a cost.

Finally, some operations have targets (like into or toArray), which 
might themselves support or not support an encounter order.

What I am trying to accomplish here is to identify reasonable user 
expectations about order preservation for the (source, operation, 
target) combinations we have, and whether we want to / need to add 
additional operations to give users more control over ordering.

Here are some starter questions.  Let "list" contain the numbers 1..10.  
What should the following expressions compute?

   list.parallel().toArray()

   list.parallel().filter(e -> e > 5).toArray();

   list.parallel().into(new ArrayList<>());

   List target = new ArrayList();
   list.parallel().forEach(e -> target.add(e));

I think the first one "obviously" should result in an array [ 1..10 ].  
And the last should "obviously" not be constrained at all as to order, 
only that it print the numbers 1 through 10 in some order. The third 
one, though, isn't completely obvious what the right answer is.

What about this:

   Map m = list.parallel().groupBy(e -> e % 2 == 0);

This partitions the list into two parts; should the elements in each 
part be in the encounter order from the original list?  (In other words, 
should this result in { 0 -> [ 2, 4, 6, 8, 10 ], 1 -> [ 1, 3, 5, 7, 9 ] 
} ?)

The model I'm targeting is: IF the source has an encounter order AND the 
operation must preserve encounter order, then we need to be sensitive to 
encounter order, otherwise anything goes.  Is this the right model?  If 
so, what are the preservation settings for each operation?

(A secondary consideration is that there are often situations where the 
computation structurally suggests an encounter order, but the user 
doesn't actually care, and then is paying for ordering he doesn't want.  
But there are fixes for this.)

Lots of more detailed questions but I'll let people respond to these.