RFR: 8330027: Identity hashes of archived objects must be based on a reproducible random seed [v3]

Fri Apr 19 06:30:57 UTC 2024

On Thu, 18 Apr 2024 18:59:37 GMT, Ioi Lam <iklam at openjdk.org> wrote:

> > @iklam @calvinccheung could you take another look please?
> > I rewrote this patch to be both minimally invasive and as bulletproof against concurrent activity as possible.
> > My thoughts on this:
> > 
> > * just initializing the global seed of os::random with a constant makes the ihash vulnerable against concurrent calls to os::random in unrelated threads. At the minimum, it makes us vulnerable against the order of thread start and number of thread starts since each thread constructor did call os::random to init its ihash seed
> > * My first patch version constified the ihash seed in the Thread constructor, but that still leaves us somewhat vulnerable against the same problem
> > * This version - also the simplest in my opinion - makes ihash seed generation lazy, on-demand only, the first time a thread generates an ihash. That is the most robust version, since when dumping, only two threads ever generate ihashes - the initial java thread, and the VM thread. Since both run sequentially, not concurrently, the order of ihash generations is deterministic, and since we restrict the seed initialization to those threads that actually do generate ihashes, we can be reasonably safe of getting the same random numbers.
> 
> This PR doesn't help CDS in terms of making the contents of archived heap objects deterministic.
> 

It does though, demonstratably. Provided one fixes the "os::random seed constant" problem. Once that is fixed, archives are non-deterministic for the reason stated in the PR description. And this PR makes the archive deterministic again. One could argue whether this is the simplest solution (and we do that below), but saying it does not help is just wrong, sorry.

> The dumping of archived heap objects is very sensitive to (Java thread) context switching: if two concurrent Java threads call `Object.identityHashcode()` on the same object, then the hashcode inside the header of this object will have deterministic values. We cannot easily recover from this with post-processing, as the shapes of the archived hashtables are influenced by the hashcode, and the CDS code doesn't know how to repack the hashtables in Java.
> 
> As a result, during `java -Xshare:dump`, we disable the launching of all new Java threads, so there's only a single (main) Java thread running the whole time
> 
> https://github.com/openjdk/jdk/blob/235ba9a7025964b1e43149c7102e26b82b2081ad/src/hotspot/share/prims/jvm.cpp#L2948-L2963
> 
> Even if we apply this PR, we still cannot run more than one Java thread. Conversely, if we stick to a single Java thread, then the complexity in this PR is not needed.

I get all that. 

But the problem is not restricted to the one java thread.

We create ihashes from the single one java thread, as well as the VM thread. They don't run concurrently, so their order of ihash creation calls is fixed for two subsequent runs of the same JVM.

Okay.

We seed each threads ihash RNG in Thread::Thread(). We seed it with `os::random`. `os::random` works with a global seed.

Okay, we can make that global seed constant too. 

But we are betting that *the number of os::random calls happening before the start of the java thread and the VM thread is constant from run to run*. It often is, but that is not guaranteed by any means. If it differs, the VMthread and the one java thread get different seeds, therefore will generate different ihashes.

The order of os::random() calls can change due to multiple reasons:
- any variations in runtime conditions could cause different code paths that cause os::random to be called or not called.
- concurrent threads could call os:;random. There *are* concurrent threads, albeit no java threads.
- Since we call os::random in Thread::Thread(), any fluctuation that changes the number of threads to be started before the VMThread/java thread is started will add to the number of prior os::random calls, thus messing up the RNG sequence.

I get that the chance for this happening is remote, but hunting sources of entropy is frustrating work, and the patch is really very simple. So, why not fix it? I don't share the opinion that this is added complexity.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18735#issuecomment-2065846909