RFR: 8330027: Identity hashes of archived objects must be based on a reproducible random seed [v3]
Ioi Lam
iklam at openjdk.org
Tue Apr 23 17:21:29 UTC 2024
On Tue, 23 Apr 2024 05:46:10 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
> > > > I get that the chance for this happening is remote, but hunting sources of entropy is frustrating work, and the patch is really very simple. So, why not fix it? I don't share the opinion that this is added complexity.
> > >
> > >
> > > Why not do it inside `Thread::Thread()`
> > > ```
> > > // thread-specific hashCode stream generator state - Marsaglia shift-xor form
> > > if (CDSConfig::is_dumping_static_archive()) {
> > > _hashStateX = 0;
> > > } else {
> > > _hashStateX = os::random();
> > > }
> > > ```
> >
> >
> > Because then it would inject `os::random` into the startup of every thread, not just of every thread that generates iHashes. So it would also fire for GC threads and other thread started before "our" threads. That would make our random sequence against order and number of threads started.
>
> My last answer was rubbish, sorry, did not read your comment carefully enough.
>
> Yes, your approach would also work, but it would lead to the two threads involved in dumping the archive - VMthread and the one Java thread - using the same seed, hence generating the same sequence of ihashes. That, in turn, can lead to different archived objects carrying the same ihash, which may negatively impact performance later when the archive is used.
I think it's better to just not compute the identity hash inside the VM thread. Here's what I tried
https://github.com/iklam/jdk/commit/ad95e2e8b00cb151617463af41648cdece2dfc7b
We thought that forcing the identity hash computation would increase sharing across processes, as it would mean fewer updates of the object headers during run time. However, most of the heap objects in the CDS archive are not accessible by the application (they are part of the archived module graph, etc). Also the archive contains a large number of Strings, which are unlikely to need the identity hash (String has its own hashcode() method).
Since the reason is rather dubious, I think it's better to remove it and simplify the system.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/18735#issuecomment-2072972651
More information about the hotspot-runtime-dev
mailing list