RFR: 8327156: Avoid copying in StringTable::intern(oop, TRAPS) [v5]

Casper Norrbin cnorrbin at openjdk.org
Mon Nov 11 13:59:36 UTC 2024


> Hi everyone,
> 
> String interning can be done on 4 different types of strings:
> - oop-strings (unicode)
> - oop-strings (latin1)
> - Symbols (non-null-terminated utf8)
> - null-terminated utf8 char arrays
> 
> Currently, when doing interning, all 4 types are first converted to unicode and copied to a jchar array. This array is used when looking in the CDS- and interning tables. If an existing string does not exist, this array is converted to a new string object, which is then inserted into the interning table.
> 
> This is less efficient than it has to be. As strings are likely to exist in the table(s), it would be beneficial to avoid the initial jchar array allocation. When inserting into the interning table, there is also a possibility to reuse the original string object, avoiding another allocation.
> 
> This change makes it possible to search in the tables using the different string types, avoiding that initial allocation. This is done by wrapping the string and tagging it with a type, with helper functions directing to the correct hashing/lookup/equal functions. When inserting into the table, we can now reuse the original object or go directly from the input type to an object. To do this, functionality had to be added to hash utf8-strings and to compare oop-strings with utf8. These convert utf8 into unicode character by character and operates on those, thus avoiding needing extra allocations.
> 
> Some quick rudimentary JMH benchmarks show a ~20% increase in throughput when interning the same string repeatedly, and a ~5% increase in throughput interning only unique strings. (Only tested on my local mac aarch debug build)
> 
> 2 new tests have also been added. The first test tests that hash codes and string equality remain consistent when converting between different string types. The second test tests that string interning works as expected when equal strings are interned from different string types.
> Also tested and passes tiers 1-3.

Casper Norrbin has updated the pull request incrementally with 834 additional commits since the last revision:

 - missed cast
 - size field moved to string wrapper
 - 8341408: Implement JEP 488: Primitive Types in Patterns, instanceof, and switch (Second Preview)
   
   Reviewed-by: vromero, jlahoda
 - 8341834: C2 compilation fails with "bad AD file" due to Replicate
   
   Reviewed-by: kvn, epeter
 - 8343068: C2: CastX2P Ideal transformation not always applied
   
   Reviewed-by: kvn, thartmann
 - 8339303: C2: dead node after failing to match cloned address expression
   
   Reviewed-by: vlivanov, kvn
 - 8331341: secondary_super_cache does not scale well: C1 and interpreter
   
   Reviewed-by: vlivanov, kvn, dlong
 - 8318442: java/net/httpclient/ManyRequests2.java fails intermittently on Linux
   
   Reviewed-by: mdoerr, lucy, dfuchs
 - 8343502: RISC-V: SIGBUS in updateBytesCRC32 after JDK-8339738
   
   Reviewed-by: mli, fjiang
 - 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor
   
   Reviewed-by: roland, kvn
 - ... and 824 more: https://git.openjdk.org/jdk/compare/cc7530cd...d292368f

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/21325/files
  - new: https://git.openjdk.org/jdk/pull/21325/files/cc7530cd..d292368f

Webrevs:
 - full: Webrev is not available because diff is too large
 - incr: Webrev is not available because diff is too large

  Stats: 473839 lines in 4140 files changed: 353094 ins; 85714 del; 35031 mod
  Patch: https://git.openjdk.org/jdk/pull/21325.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21325/head:pull/21325

PR: https://git.openjdk.org/jdk/pull/21325


More information about the hotspot-dev mailing list