Encounter order: take 2

Paul Sandoz paul.sandoz at oracle.com
Fri Feb 1 01:22:02 PST 2013


On Feb 1, 2013, at 12:27 AM, Mike Duigou <mike.duigou at oracle.com> wrote:
>> 
>> An intermediate operation may clear encounter order so that the output stream and corresponding input stream to the next intermediate operation or terminal operation does not have encounter order.
>> There are no such operations implemented. 
>> (Previously the unordered() operation cleared encounter order.)
>> 
>> Otherwise an intermediate operation must preserve encounter order if required to do so (see next paragraphs).
>> 
>> An intermediate operation may choose to apply a different algorithm if encounter order of input stream must be preserved or not.
>> The distinct() operation will, when evaluating in parallel, use a ConcurrentHashMap to store unique elements if encounter order does not need to be preserved, otherwise if encounter order needs to be preserved a fold will be performed (equivalent of, in parallel, map each element to a singleton linked set then associatively reduce, left-to-right, the linked sets to one linked set).
> 
> Without unordered() how is the CHM version accessed if the source is an ArrayList?
> 

If the source has encounter order the distinct operation will choose whether to preserve encounter order or not as per clause b.2. i.e. the properties of the terminal operation are a factor.

Implementation wise the op checks if the ORDERED flag is on the bit set of flags passed to it so it is not as complicated as it sounds.


>> An intermediate operation must preserve encounter order of output stream if:
>> 
>> a.1) the input stream to the intermediate operation has encounter order (either because the stream source has encounter order or because a previous intermediate operation injects encounter order); and
>> a.2) the terminal operation preserves encounter order.
>> 
>> An intermediate operation may not preserve encounter order of the output stream if:
>> 
>> b.1) the input stream to the intermediate operation does not have encounter order (either because the stream source does not have encounter order or because a previous intermediate operation clears encounter order); or
>> b.2) the terminal operation does not preserve encounter order *and* the intermediate operation is in a sequence of operations, to be computed, where the last operation in the sequence is the terminal operation and all operations in the sequence are computed in parallel.
>> 
>> Rule b.2 above ensures that encounter order is preserved for the following pipeline on the sequential().forEach():
>> 
>> list.parallelStream().distinct().sequential().forEach()
>> 
>> i.e. the distinct() intermediate operation will preserve the encounter order of the list stream source.
> 
> I find this result surprising (and disappointing). Users are going to be surprised by the poor performance of using parallelStream in this case.
> 

The sequential() op is currently implemented as a full barrier to ensure elements are reported sequentially downstream in the same thread that created the stream.

The following will produce the same output:

  list.stream().distinct().forEach(...)
  list.parallelStream().distinct().sequential().forEach(...)

Which i think conforms to the principle of least surprise.

If performance is a concern and order is not then one should not use sequential e.g. do:

  list.parallelStream().distinct().forEach(e -> { synchronizied(this) { ... } } )

or a concurrent collect.

Paul.


More information about the lambda-libs-spec-observers mailing list