The non-deterministic iteration order of Immutable Collections
Stuart Marks
stuart.marks at oracle.com
Sat Mar 25 00:16:39 UTC 2023
On 3/24/23 10:18 AM, Chris Hegarty wrote:
> I know that this has come up a number of times before, but I cannot seem to find
> anything directly relevant to my use-case, or recent.
>
> The iteration order of immutable Sets and Maps (created by the `of` factories) is
> clearly unspecified - good. Code should not depend upon this iteration order, and if
> so that code is broken. I completely agree and am aligned with this principle.
>
> The Elasticsearch code base heavily uses these collection implementations, and their
> iterators. On occasion we run into issues, where our code (because of a bug or a
> race), inadvertently has a dependency on the iteration order. To be clear, this is a
> bug in our code, and we want to reproduce and fix it. The problem we are
> encountering is that the iteration order is not determinable or reproducible, which
> makes it extremely challenging to reproduce the bug even when we have reproducible
> testcase to hand. ( whereas, for example, it is common practice in other parts of
> the code to log a seed used for randomization )
>
> We very much like using the immutable collections, and we definitely don't want to
> pay the price of sorting things in the code, since we don't want to, and should not,
> depend upon the iteration order. The only concern is with reproducibility when we
> run into (our own) bugs.
>
> I don't (yet) want to be prescriptive in any potential solution. And I know that
> this has been discussed before. I mostly just want to start a conversation and see
> how much traction it gets.
Hi Chris,
Yes, this has come up before, but it's been mostly theoretical. That is, people
worry about this when they hear of the idea of randomized iteration order, but I've
never heard any followup. This is in fact the first time I've heard of an actual
case where this is a real problem. So thanks for bringing it up.
(And unfortunately, it's notoriously difficult to make code truly iteration-order
dependent. We've had our own history of problems with this in the JDK. I'd be
interested in hearing from you at some point about the exact pathology of how this
occurred.)
There's currently no debugging or experimental interface that will let one print or
set the random seed that's in use. The obvious approach of using an agent to hack
away at runtime doesn't work, because the unmodifiable collections implementations
are used very early in startup and they're usually loaded before the agent runs.
There are limitations on what an agent can do when redefining an already-loaded
class, for example, removing the 'final' modifier from fields isn't supported. (Well
I suppose one can always use Unsafe.)
Here's another approach that might work for you.
1. Write a little program that extracts the class bytes of ImmutableCollection.class
from the runtime image, and use something like ASM or ClassFile to make the class
public and to make the REVERSE and SALT32L fields public and non-final.
2. Launch a VM using the --patch-module option to use this class instead of the
built-in one.
3. Write an agent, or some debugging library that's called at the right time, to use
simple reflective access to get or set those fields as desired.
This is a bit fiddly but it might be easier than rebuilding and deploying a custom JDK.
Setting (formerly) final fields after the VM is initialized is often somewhat risky.
In this case these particular fields are used only during iteration, and they don't
actually change the layout of any data structures. So setting them to some desired
value should apply to all subsequent iterations, even of existing data structures.
I'll think about better ways to do this in the product. The best approach isn't
obvious. The typical way of doing things like this using system properties is
tricky, as it depends on order of class initialization at startup (and you know how
fragile that is).
s'marks
More information about the core-libs-dev
mailing list