RFR: 8327156: Avoid copying in StringTable::intern(oop, TRAPS) [v6]
David Holmes
dholmes at openjdk.org
Tue Nov 12 00:14:39 UTC 2024
On Mon, 11 Nov 2024 14:45:43 GMT, Casper Norrbin <cnorrbin at openjdk.org> wrote:
>> Hi everyone,
>>
>> String interning can be done on 4 different types of strings:
>> - oop-strings (unicode)
>> - oop-strings (latin1)
>> - Symbols (non-null-terminated utf8)
>> - null-terminated utf8 char arrays
>>
>> Currently, when doing interning, all 4 types are first converted to unicode and copied to a jchar array. This array is used when looking in the CDS- and interning tables. If an existing string does not exist, this array is converted to a new string object, which is then inserted into the interning table.
>>
>> This is less efficient than it has to be. As strings are likely to exist in the table(s), it would be beneficial to avoid the initial jchar array allocation. When inserting into the interning table, there is also a possibility to reuse the original string object, avoiding another allocation.
>>
>> This change makes it possible to search in the tables using the different string types, avoiding that initial allocation. This is done by wrapping the string and tagging it with a type, with helper functions directing to the correct hashing/lookup/equal functions. When inserting into the table, we can now reuse the original object or go directly from the input type to an object. To do this, functionality had to be added to hash utf8-strings and to compare oop-strings with utf8. These convert utf8 into unicode character by character and operates on those, thus avoiding needing extra allocations.
>>
>> Some quick rudimentary JMH benchmarks show a ~20% increase in throughput when interning the same string repeatedly, and a ~5% increase in throughput interning only unique strings. (Only tested on my local mac aarch debug build)
>>
>> 2 new tests have also been added. The first test tests that hash codes and string equality remain consistent when converting between different string types. The second test tests that string interning works as expected when equal strings are interned from different string types.
>> Also tested and passes tiers 1-3.
>
> Casper Norrbin has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
>
> size moved to wrapper
Looking good. I like the internalization of the length into StringWrapper.
A couple of nits/queries.
Thanks
src/hotspot/share/classfile/javaClasses.cpp line 393:
> 391: #ifdef ASSERT
> 392: // This check is too strict on older classfile versions
> 393: if (UTF8::is_legal_utf8((const unsigned char*)utf8_str, utf8_len, /*version_leq_47*/false))
I don't see the point of this check. Any validation would/should be done when the Symbol is created, not when it is later used.
src/hotspot/share/classfile/javaClasses.hpp line 184:
> 182: }
> 183:
> 184: static unsigned int hash_code(const char *utf8_str, size_t utf8_len) {
Suggestion:
static unsigned int hash_code(const char* utf8_str, size_t utf8_len) {
src/hotspot/share/classfile/stringTable.cpp line 153:
> 151: }
> 152:
> 153: bool StringTable::wrapped_string_equals(oop java_string, StringWrapper wrapped_str, int _) {
Not sure what hotspot style rules are for unused parameters. Probably worth a comment as to why this unused parameter exists.
src/hotspot/share/classfile/stringTable.cpp line 421:
> 419: switch (wrapped_str.type) {
> 420: case StringType::UnicodeStr:
> 421: return wrapped_str.unicode_str;
This doesn't set `len`.
src/hotspot/share/classfile/stringTable.cpp line 445:
> 443: }
> 444:
> 445: Handle StringTable::to_handle(StringWrapper wrapped_str, TRAPS) {
`to_handle` is a bit too generic - it's like saying `to_pointer`. You are returning the Handle form of a java.lang.String oop.
src/hotspot/share/classfile/stringTable.hpp line 83:
> 81: typedef struct StringWrapperInternal StringWrapper;
> 82:
> 83: static bool wrapped_string_equals(oop java_string, StringWrapper wrapped_str, int _ = 0);
One thing I can't quite connect the dots on here, is what gets passed as the unused length when this is called from the table code?
-------------
PR Review: https://git.openjdk.org/jdk/pull/21325#pullrequestreview-2428231859
PR Review Comment: https://git.openjdk.org/jdk/pull/21325#discussion_r1837245814
PR Review Comment: https://git.openjdk.org/jdk/pull/21325#discussion_r1837316465
PR Review Comment: https://git.openjdk.org/jdk/pull/21325#discussion_r1837247002
PR Review Comment: https://git.openjdk.org/jdk/pull/21325#discussion_r1837319659
PR Review Comment: https://git.openjdk.org/jdk/pull/21325#discussion_r1837322352
PR Review Comment: https://git.openjdk.org/jdk/pull/21325#discussion_r1837325852
More information about the hotspot-dev
mailing list