Confusion (or bugs) regarding the 'UNORDERED' Collector Characteristic

Chris Dennis chris.w.dennis at gmail.com
Fri Oct 27 20:56:02 UTC 2017


Hi All,

I’m very confused about what was intended with the ‘UNORDERED’ Collector characteristic.  My logical inference was that unordered as a Collector characteristic meant that the result of this collector applied to any stream was independent of the order of element delivery.  This means that the stream backend could safely elide any ordered characteristic of the stream driving the collector and open up more possible execution pathways (parallelism for example).  The javadoc for unordered is two sentences:

/**
 * Indicates that the collection operation does not commit to preserving
 * the encounter order of input elements.  (This might be true if the
 * result container has no intrinsic order, such as a {@link Set}.)
 */

The first sentence makes sense, although it is phrased in a way that also explains the behavior seen if you flag an order sensitive collector as unordered.  The second sentence is weird, it implies a relation between collectors, encounter order, and an inherent ordering or iteration order of a collection object that might be the target of a collection. To me the important property of a Collection object with regards to ordering of a Collector that fills it is whether reordering the insertions to the collection can change the resultant state, for a Set this is clearly true since only the first presented element that is equal will be stored.

There is also a paragraph in the Collector javadoc:

* <p>For collectors that do not have the {@code UNORDERED} characteristic,
* two accumulated results {@code a1} and {@code a2} are equivalent if
* {@code finisher.apply(a1).equals(finisher.apply(a2))}.  For unordered
* collectors, equivalence is relaxed to allow for non-equality related to
* differences in order.  (For example, an unordered collector that accumulated
* elements to a {@code List} would consider two lists equivalent if they
* contained the same elements, ignoring order.)

This seems weird, why would we try to define correctness of a parallel reduction against a collector that was sensitive to ordering.

Finally, when investigating the properties of the collectors returned from the Collectors static factory I was surprised to discover that none of the collectors that are truly unordered were marked as such (counting, min, max, averaging, summing, summarizing), and the only collector that was marked as unordered was Collectors.toSet(), which although it is explicitly marked as unordered seems like it really shouldn’t be.

Whats going on here?  Which parts of all this are intended and which (if any) are bugs?

Thanks in advance for any enlightenment,

Chris Dennis

P.S. Coincidentally the unordered toSet() collector works perfectly at the moment due to the happy interaction of the Spliterator.trySplit() contract and the folding/combining behavior of the stream implementations parallel reduction algorithm.


More information about the core-libs-dev mailing list