Encounter order

Brian Goetz brian.goetz at oracle.com
Thu Oct 25 09:00:22 PDT 2012


Since we seem to be agreed that the question is not "what should 
parallel foo() do", but "what should foo() do", lets walk through the 
operations.

     Stream<T> filter(Predicate<? super T> predicate);
     <R> Stream<R> map(Mapper<? super T, ? extends R> mapper);
     <R> Stream<R> flatMap(FlatMapper<? super T, R> mapper);
     Stream<T> tee(Block<? super T> block);

For stateless intermediate operations (filter, map, flatMap, tee), they 
are all capable of preserving encounter order at basically no cost.  So 
if the stream source has an encounter order and the terminal operation 
wants to use encounter order, these guys just play along.

     Stream<T> limit(int n);
     Stream<T> skip(int n);
     Optional<T> findFirst();

These are basically *about* encounter order, so obviously they would 
respect it.  So, question is, what do they do for streams that have no 
defined encounter order?  (My sense is: infer one from the iteration 
order.  set.stream().findFirst() is perfectly reasonable, its just that 
"first" doesn't mean as much as it does on a list.)  Preserving 
encounter order in parallel does have a real cost, but if the user asked 
for the first seven in encounter order, that's what we should give them.

     Optional<T> findAny();

This one is explicitly about *ignoring* encounter order and optimizing 
for fastest return.

     void forEach(Block<? super T> block);

This is only about side effects, so encounter order shouldn't enter into 
the calculation.  Elements are fed to the block in whatever order and 
thread they are available.

     Object[] toArray();

This one seems pretty clear that you expect to see the elements in the 
array in encounter order.  (Again, if no encounter order, we should make 
one up based on iterator/forEach order.)  In general we can still 
parallelize efficiently here (this depends on how early we know the 
sizes of each chunk; if we know these at split time the cost is 
basically zero.)

     Stream<T> sorted(Comparator<? super T> comparator);

Since sort explicitly reorders things, one might think that encounter 
order is totally irrelevant.  However, stability (preserving encounter 
order for values that are equal according to the comparator) is a 
desirable property of sorting (and stability does add considerable cost 
to parallel sorting.)  So there's a decision to make here.  (And 
(Andrey) this is relevant whether sorting is an intermediate or terminal 
operation.)

     Stream<T> cumulate(BinaryOperator<T> operator);

This one is explicitly about encounter order too.  There are efficient 
parallel algorithms for this (which is kind of the whole point of 
including it at all.)

     boolean anyMatch(Predicate<? super T> predicate);
     boolean allMatch(Predicate<? super T> predicate);
     boolean noneMatch(Predicate<? super T> predicate);

These are independent of encounter order.

     Optional<T> reduce(BinaryOperator<T> op);
     T reduce(T base, BinaryOperator<T> op);
     <U> U fold(Factory<U> baseFactory,
                Combiner<U, T, U> reducer,
                BinaryOperator<U> combiner);

Whether these are sensitive to encounter order depends on whether the 
operators are commutative or not.  Traditionally, reduce/fold operators 
are expected to be associative but not commutative.  The cost of 
respecting order in parallel here are minor; basically the bookkeeping 
overhead of remembering who your children are and their order, and the 
delays associated with tasks not completing until all their children 
complete.

If the source has no encounter order, my inclination here is (again) to 
assume that the programmer understood that, impute an encounter order 
from the implementation, and feed the elements in that order.  If the 
user provides a reducer that is associative but not commutative (e.g., 
String::concat), he may get a scrambled result, but this is no different 
from what happens when you iterate over a Set today.

     Stream<T> uniqueElements();

This one is not obvious to me.  On the one hand, yielding results in 
encounter order seems polite (like stable sorting, but more so); on the 
other, preserving order in parallel is fairly expensive.

     <U> Map<U, Collection<T>> groupBy(Mapper<? super T, ? extends U> 
classifier);
     <U, W> Map<U, W> reduceBy(Mapper<? super T, ? extends U> classifier,
                               Factory<W> baseFactory,
                               Combiner<W, T, W> reducer);

Again, I am not sure what to do about these.  Preserving encounter order 
in parallel definitely has a cost here.

     <A extends Destination<? super T>> A into(A target);

This one is interesting, because in addition to "can the target support 
an order", we also have to be careful about "is the target thread-safe." 
  What we've done here is implement it so that the target decides what 
to do; it gets passed the stream, and can choose to extract elements in 
serial or in parallel.  Standard non-thread-safe collections force 
sequential insertion (though upstream operations can still proceed in 
parallel, so if you do list.filter().into(new ArrayList()), the 
filtering still proceeds in parallel, then the elements are collected in 
order and can be sequentially inserted into the new list.)


So overall, the problematic operations are sort, uniqueElements, 
groupBy, and reduceBy, because these are the ones where the cost is not 
minor.  Secondarily, we need to vet whether our simplifying assumptions 
about imputing an encounter order when one is needed are acceptable.

In cases where we know there is no encounter order, the implementation 
is free (and does) use the more efficient approach.  It is also trivial 
and basically cost-free (O(1) time cost) to introduce a new op, 
"unordered()", which would strip a stream of its encounter order.  So if 
uniqueElements were deemed to require preserving order, but if the user 
didn't care, he could:

   list.unordered().uniqueElements().toArray();

and get the faster implementation of uniqueElements.

Given this, my recommendation is:
  - Have all the problematic (sort, uniqueElements, groupBy, reduceBy) 
ops respect encounter order even though it may be expensive
  - Provide an .unordered() op for people to opt out of the encounter 
order when they know they don't care.
  - Impute encounter order if we need one and there isn't one (rather 
than throwing)



More information about the lambda-libs-spec-observers mailing list