RFR: 8330027: Identity hashes of archived objects must be based on a reproducible random seed [v3]

Ioi Lam iklam at openjdk.org
Wed May 1 17:49:53 UTC 2024


On Wed, 24 Apr 2024 16:27:36 GMT, Ioi Lam <iklam at openjdk.org> wrote:

>>> > > > > I get that the chance for this happening is remote, but hunting sources of entropy is frustrating work, and the patch is really very simple. So, why not fix it? I don't share the opinion that this is added complexity.
>>> > > > 
>>> > > > 
>>> > > > Why not do it inside `Thread::Thread()`
>>> > > > ```
>>> > > > // thread-specific hashCode stream generator state - Marsaglia shift-xor form
>>> > > >   if (CDSConfig::is_dumping_static_archive()) {
>>> > > >      _hashStateX = 0;
>>> > > >   } else {
>>> > > >      _hashStateX = os::random();
>>> > > >   }  
>>> > > > ```
>>> > > 
>>> > > 
>>> > > Because then it would inject `os::random` into the startup of every thread, not just of every thread that generates iHashes. So it would also fire for GC threads and other thread started before "our" threads. That would make our random sequence against order and number of threads started.
>>> > 
>>> > 
>>> > My last answer was rubbish, sorry, did not read your comment carefully enough.
>>> > Yes, your approach would also work, but it would lead to the two threads involved in dumping the archive - VMthread and the one Java thread - using the same seed, hence generating the same sequence of ihashes. That, in turn, can lead to different archived objects carrying the same ihash, which may negatively impact performance later when the archive is used.
>>> 
>>> I think it's better to just not compute the identity hash inside the VM thread. Here's what I tried
>>> 
>>> [iklam at ad95e2e](https://github.com/iklam/jdk/commit/ad95e2e8b00cb151617463af41648cdece2dfc7b)
>>> 
>>> We thought that forcing the identity hash computation would increase sharing across processes, as it would mean fewer updates of the object headers during run time. However, most of the heap objects in the CDS archive are not accessible by the application (they are part of the archived module graph, etc). Also the archive contains a large number of Strings, which are unlikely to need the identity hash (String has its own hashcode() method).
>>> 
>>> Since the reason is rather dubious, I think it's better to remove it and simplify the system.
>> 
>> I like that, that is simpler. Okay, then we will only call ihash from a single thread, so a global constant seed should be fine. I should be able to assert that, right?
>
>> > I think it's better to just not compute the identity hash inside the VM thread. Here's what I tried
>> > [iklam at ad95e2e](https://github.com/iklam/jdk/commit/ad95e2e8b00cb151617463af41648cdece2dfc7b)
>> > We thought that forcing the identity hash computation would increase sharing across processes, as it would mean fewer updates of the object headers during run time. However, most of the heap objects in the CDS archive are not accessible by the application (they are part of the archived module graph, etc). Also the archive contains a large number of Strings, which are unlikely to need the identity hash (String has its own hashcode() method).
>> > Since the reason is rather dubious, I think it's better to remove it and simplify the system.
>> 
>> I like that, that is simpler. Okay, then we will only call ihash from a single thread, so a global constant seed should be fine. I should be able to assert that, right?
> 
> AI think an assert can be added, since we don't allow any Java threads to be launched. So even test cases that run arbitrary Java code during -Xshare:dump (using Java agents of -XX:ArchiveHeapTestClass) will not be able to run any Java code outside of the main Java thread.

> @iklam Sorry, had to force push because I messed up the PR branch somehow.

I think it's possible to just discard your local branch, and check out a new version from the PR, then make changes on top of it. That way you can avoid forced pushes.

> The new version contains the change proposed by you - getting rid of explicit ihash generation when dumping - as well as a simplified version of my original patch. We now are back at generating the ihash seed in Thread::Thread(), with a constant if CDS dumps, and on ihash generation we check that we ever only call it with a single thread.

This version looks good. I am running our tiers 1-4 just to be sure.

Thanks

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18735#issuecomment-2088832536


More information about the hotspot-runtime-dev mailing list