The non-deterministic iteration order of Immutable Collections

Rob Spoor openjdk at icemanx.nl
Fri Mar 24 18:25:43 UTC 2023


Check these:
* 
https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/Map.html#unmodifiable: 
"The iteration order of mappings is unspecified and is subject to change."
* 
https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/Set.html#unmodifiable: 
"The iteration order of set elements is unspecified and is subject to 
change."

Unlike HashSet, the implementation will have a different iteration order 
for the same content if you restart the JVM, as stated here: 
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/ImmutableCollections.java#L54
In other words, it not only discourages callers to not rely on iteration 
order, it punishes them; any code that relies on it will break after the 
JVM is restarted. The problem the Elasticsearch team has is *finding* 
these errors.


On 24/03/2023 18:53, Holo The Sage Wolf wrote:
> It is highly dependent on the exact implementation, but the situation you
> are describing us weird.
> 
> In most cases the order *is* deterministic for a read-only view (which is
> the case in, for example, HashMap (and hence for HashSet)).
> 
> In a deterministic implementation the only cases where the iteration order
> will change is if you modify the collection, change the implementation
> (e.g. downcast, or change version), or, in rare cases, when you copy the
> collection.
> 
> If you are using non-determinsitic implementation (which I do not know if
> such implementation exists in the std) then the means of reproduction would
> probably comes down to set a seed correctly
> 
> On Fri, Mar 24, 2023, 20:19 Chris Hegarty <chegar999 at gmail.com> wrote:
> 
>> I know that this has come up a number of times before, but I cannot seem
>> to find anything directly relevant to my use-case, or recent.
>>
>> The iteration order of immutable Sets and Maps (created by the `of`
>> factories) is clearly unspecified - good. Code should not depend upon
>> this iteration order, and if so that code is broken. I completely agree
>> and am aligned with this principle.
>>
>> The Elasticsearch code base heavily uses these collection
>> implementations, and their iterators. On occasion we run into issues,
>> where our code (because of a bug or a race), inadvertently has a
>> dependency on the iteration order. To be clear, this is a bug in our
>> code, and we want to reproduce and fix it.  The problem we are
>> encountering is that the iteration order is not determinable or
>> reproducible, which makes it extremely challenging to reproduce the bug
>> even when we have reproducible testcase to hand. ( whereas, for example,
>> it is common practice in other parts of the code to log a seed used for
>> randomization )
>>
>> We very much like using the immutable collections, and we definitely
>> don't want to pay the price of sorting things in the code, since we
>> don't want to, and should not, depend upon the iteration order. The only
>> concern is with reproducibility when we run into (our own) bugs.
>>
>> I don't (yet) want to be prescriptive in any potential solution. And I
>> know that this has been discussed before. I mostly just want to start a
>> conversation and see how much traction it gets.
>>
>> Thanks,
>> -Chris.
>>
> 



More information about the core-libs-dev mailing list