Confusion (or bugs) regarding the 'UNORDERED' Collector Characteristic
Stuart Marks
stuart.marks at oracle.com
Wed Nov 1 19:55:42 UTC 2017
On 10/27/17 1:56 PM, Chris Dennis wrote:
> I’m very confused about what was intended with the ‘UNORDERED’ Collector characteristic. My logical inference was that unordered as a Collector characteristic meant that the result of this collector applied to any stream was independent of the order of element delivery. This means that the stream backend could safely elide any ordered characteristic of the stream driving the collector and open up more possible execution pathways (parallelism for example). The javadoc for unordered is two sentences:
>
> /**
> * Indicates that the collection operation does not commit to preserving
> * the encounter order of input elements. (This might be true if the
> * result container has no intrinsic order, such as a {@link Set}.)
> */
>
> The first sentence makes sense, although it is phrased in a way that also explains the behavior seen if you flag an order sensitive collector as unordered. The second sentence is weird, it implies a relation between collectors, encounter order, and an inherent ordering or iteration order of a collection object that might be the target of a collection. To me the important property of a Collection object with regards to ordering of a Collector that fills it is whether reordering the insertions to the collection can change the resultant state, for a Set this is clearly true since only the first presented element that is equal will be stored.
I think you need to be very careful making any assumptions about which of equal
elements will end up in a set. Consider the spec of Set.addAll:
Adds all of the elements in the specified collection to this set
if they're not already present.
Suppose the source collection has several non-identical elements that are equal
to each other. This says nothing about which of them is preserved, and indeed
there is no mention whatsoever about the identity of equal objects. The entire
Set spec is defined in terms of equals, not identity. So I think it's mistaken
to assume that the first of equal but non-identical objects will be stored.
(This is also the reason I changed the specification of Set.copyOf in my recent
draft, to remove the requirement that the first such element be preserved.
Set.addAll and Collectors.toSet make no such stipulation.)
> There is also a paragraph in the Collector javadoc:
>
> * <p>For collectors that do not have the {@code UNORDERED} characteristic,
> * two accumulated results {@code a1} and {@code a2} are equivalent if
> * {@code finisher.apply(a1).equals(finisher.apply(a2))}. For unordered
> * collectors, equivalence is relaxed to allow for non-equality related to
> * differences in order. (For example, an unordered collector that accumulated
> * elements to a {@code List} would consider two lists equivalent if they
> * contained the same elements, ignoring order.)
>
> This seems weird, why would we try to define correctness of a parallel reduction against a collector that was sensitive to ordering.
This is indeed a bit odd, but it makes sense. You might have a collector that
collects into a List, because you want to collect duplicates. But you might not
actually care about the order; you just want all the elements. The fact that
List stores elements in some order is irrelevant. In this case such a collector
would quite reasonably have the UNORDERED characteristic.
> Finally, when investigating the properties of the collectors returned from the Collectors static factory I was surprised to discover that none of the collectors that are truly unordered were marked as such (counting, min, max, averaging, summing, summarizing), and the only collector that was marked as unordered was Collectors.toSet(), which although it is explicitly marked as unordered seems like it really shouldn’t be.
>
> Whats going on here? Which parts of all this are intended and which (if any) are bugs?
Well I think it does make sense for the toSet() collector to be UNORDERED. It
may be the case, though, that other collectors are lacking the UNORDERED
characteristic that they should have. That sounds like it might be a bug.
s'marks
>
> Thanks in advance for any enlightenment,
>
> Chris Dennis
>
> P.S. Coincidentally the unordered toSet() collector works perfectly at the moment due to the happy interaction of the Spliterator.trySplit() contract and the folding/combining behavior of the stream implementations parallel reduction algorithm.
>
More information about the core-libs-dev
mailing list