Encounter order: take 2

Mike Duigou mike.duigou at oracle.com
Thu Jan 31 15:27:17 PST 2013


On Jan 31 2013, at 03:50 , Paul Sandoz wrote:

> Hi,
> 
> Here is an update. Thanks to Brian and Neal for their feedback.
> 
> Previously i told a minor mistruth. For the terminal operation collect(toSet()) we currently do not have a way for the Collector to state that encounter order is not preserved and thus propagate that information to the collect terminal operation.
> 
> Clarification: although the rules for an intermediate operation to preserve encounter order may seem complex it is cheap to implement.
> 
> Paul.
> 
> --
> 
> An input stream is a stream of elements input to an intermediate operation or terminal operation.
> 
> An output stream is a stream of elements output from an intermediate operation.
> 
> A stream source is a stream of elements input to the first operation in a sequence of zero or more intermediate operations and one final terminal operation.
> 
> An intermediate operation is a computation that transforms an input stream to an output stream.
> 
> A terminal operation is a computation that produces a result from an input stream.
> 
> A stream source has or does not have encounter order.
> A stream source has encounter order if traversal of all its elements will always result in encountering those elements in the same order.
> A stream source does not have encounter order if traversal of all its elements may not always result in encountering those elements in the same order.
> List and arrays are sources that have encounter order (arrays can be said to also have a spatial order).
> HashSet and PriorityQueue are sources that do not have encounter order.
> 
> A terminal operation preserves or does not preserve encounter order when computing a result.
> If the input stream has encounter order and the terminal operation preserves encounter order then the computation must respect encounter order and associativity when producing the result. (Elements may be arbitrarily grouped but order must be respected for those elements and intermediate results computed from groups of elements or previously computed intermediate results.)
> If the input stream has encounter order and the terminal operation does not preserve encounter order then the computation may not respect encounter order and is free to process elements of the input stream in any order when producing the result.
> If the input stream does not have encounter order and the terminal operation preserves encounter order then the computation may impute encounter order (directly from traversal of the input stream) when producing the result.
> If the input stream does not have encounter order and the terminal operation does not preserve encounter order then the computation is free to process elements of the input stream in any order when producing the result.
> Non-preserving terminal operations are forEach, forEachUntil, findAny, match{Any, None, All}, collect(toSet()) and collectUnordered.
> 
> An intermediate operation may inject encounter order so that the output stream and corresponding input stream to the next intermediate operation or terminal operation has encounter order.
> The sorted() operation injects encounter.

encounter order.

> 
> An intermediate operation may clear encounter order so that the output stream and corresponding input stream to the next intermediate operation or terminal operation does not have encounter order.
> There are no such operations implemented. 
> (Previously the unordered() operation cleared encounter order.)
> 
> Otherwise an intermediate operation must preserve encounter order if required to do so (see next paragraphs).
> 
> An intermediate operation may choose to apply a different algorithm if encounter order of input stream must be preserved or not.
> The distinct() operation will, when evaluating in parallel, use a ConcurrentHashMap to store unique elements if encounter order does not need to be preserved, otherwise if encounter order needs to be preserved a fold will be performed (equivalent of, in parallel, map each element to a singleton linked set then associatively reduce, left-to-right, the linked sets to one linked set).

Without unordered() how is the CHM version accessed if the source is an ArrayList?

> An intermediate operation must preserve encounter order of output stream if:
> 
> a.1) the input stream to the intermediate operation has encounter order (either because the stream source has encounter order or because a previous intermediate operation injects encounter order); and
> a.2) the terminal operation preserves encounter order.
> 
> An intermediate operation may not preserve encounter order of the output stream if:
> 
> b.1) the input stream to the intermediate operation does not have encounter order (either because the stream source does not have encounter order or because a previous intermediate operation clears encounter order); or
> b.2) the terminal operation does not preserve encounter order *and* the intermediate operation is in a sequence of operations, to be computed, where the last operation in the sequence is the terminal operation and all operations in the sequence are computed in parallel.
> 
> Rule b.2 above ensures that encounter order is preserved for the following pipeline on the sequential().forEach():
> 
> list.parallelStream().distinct().sequential().forEach()
> 
> i.e. the distinct() intermediate operation will preserve the encounter order of the list stream source.

I find this result surprising (and disappointing). Users are going to be surprised by the poor performance of using parallelStream in this case.

Mike.



More information about the lambda-libs-spec-experts mailing list