RFR: 8327156: Avoid copying in StringTable::intern(oop, TRAPS) [v6]

Casper Norrbin cnorrbin at openjdk.org
Mon Nov 11 15:06:15 UTC 2024


On Mon, 11 Nov 2024 14:45:43 GMT, Casper Norrbin <cnorrbin at openjdk.org> wrote:

>> Hi everyone,
>> 
>> String interning can be done on 4 different types of strings:
>> - oop-strings (unicode)
>> - oop-strings (latin1)
>> - Symbols (non-null-terminated utf8)
>> - null-terminated utf8 char arrays
>> 
>> Currently, when doing interning, all 4 types are first converted to unicode and copied to a jchar array. This array is used when looking in the CDS- and interning tables. If an existing string does not exist, this array is converted to a new string object, which is then inserted into the interning table.
>> 
>> This is less efficient than it has to be. As strings are likely to exist in the table(s), it would be beneficial to avoid the initial jchar array allocation. When inserting into the interning table, there is also a possibility to reuse the original string object, avoiding another allocation.
>> 
>> This change makes it possible to search in the tables using the different string types, avoiding that initial allocation. This is done by wrapping the string and tagging it with a type, with helper functions directing to the correct hashing/lookup/equal functions. When inserting into the table, we can now reuse the original object or go directly from the input type to an object. To do this, functionality had to be added to hash utf8-strings and to compare oop-strings with utf8. These convert utf8 into unicode character by character and operates on those, thus avoiding needing extra allocations.
>> 
>> Some quick rudimentary JMH benchmarks show a ~20% increase in throughput when interning the same string repeatedly, and a ~5% increase in throughput interning only unique strings. (Only tested on my local mac aarch debug build)
>> 
>> 2 new tests have also been added. The first test tests that hash codes and string equality remain consistent when converting between different string types. The second test tests that string interning works as expected when equal strings are interned from different string types.
>> Also tested and passes tiers 1-3.
>
> Casper Norrbin has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
> 
>   size moved to wrapper

After some investigation, I identified that the failing test mentioned above was due to complex strings in classes with older classfile versions not passing the verification check in `create_from_symbol`. The checks assume ”correct UTF8", which older classfile versions do not necessarily conform to. To address this, I re-added the check for valid UTF8 I had before. This issue did not appear previously because the conversion to Unicode bypassed these checks, though the conversion process is functionally identical to that in `create_from_symbol`.

To ensure correctness, I re-ran all tests with additional asserts added to confirm that the created string objects match those created by first converting symbols to unicode. All strings were identical, and the observed behaviour was consistent with the previous implementation.

Additionally, I updated how string lengths are handled: UTF8/ambigous lengths now use `size_t` and conversions to `int`s are performed when calling external functions requiring Unicode length. The `equals` functions in `CompactHashtable` requires an `int` length, which is incompatible with the larger UTF8 sizes, so I resolved this by storing the length within the string wrapper instead. This change allows us to only pass the wrapped string between functions without the length.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21325#issuecomment-2468390681


More information about the hotspot-dev mailing list