RFR: 8254598: StringDedupTable should use OopStorage

Thomas Schatzl tschatzl at openjdk.java.net
Tue May 4 12:21:55 UTC 2021


On Fri, 23 Apr 2021 19:48:47 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this change to the String Deduplication facility.  It is being
> changed to use OopStorage to hold weak references to relevant objects,
> rather than bespoke data structures requiring dedicated processing phases by
> supporting GCs.
> 
> (The Shenandoah update was contributed by Zhengyu Gu.)
> 
> This change significantly simplifies the interface between a given GC and
> the String Deduplication facility, which should make it much easier for
> other GCs to opt in.  However, this change does not alter the set of GCs
> providing support; currently only G1 and Shenandoah support string
> deduplication.  Adding support by other GCs will be in followup RFEs.
> 
> Reviewing via the diffs might not be all that useful for some parts, as
> several files have been essentially completely replaced, and a number of
> files have been added or eliminated.  The full webrev might be a better
> place to look.
> 
> The major changes are in gc/shared/stringdedup/* and in the supporting
> collectors, but there are also some smaller changes in other places, most
> notably classfile/{stringTable,javaClasses}.
> 
> This change is additionally labeled for review by core-libs, although there
> are no source changes there.  This change injects a byte field of bits into
> java.lang.String, using one of the previously unused padding bytes in that
> class.  (There were two unused bytes, now only one.)
> 
> Testing:
> mach5 tier1-2 with and without -XX:+UseStringDeduplication
> 
> Locally (linux-x64) ran all of the existing tests that use string
> deduplication with both G1 and Shenandoah.  Note that
> TestStringDeduplicationInterned.java is disabled for shenandoah, as it
> currently fails, for reasons I haven't figured out but suspect are test
> related.
> 
> - gc/stringdedup/   -- these used to be in gc/g1
> - runtime/cds/SharedStringsDedup.java
> - runtime/cds/appcds/cacheObject/DifferentHeapSizes.java
> - runtime/cds/appcds/sharedStrings/*
> 
> shenandoah-only:
> - gc/shenandoah/jvmti/TestHeapDump.java
> - gc/shenandoah/TestStringDedup.java
> - gc/shenandoah/TestStringDedupStress.java
> 
> Performance tested baseline, baseline + stringdedup, new with stringdedup,
> finding no significant differences.

First pass, just comment suggestions for now.

src/hotspot/share/classfile/javaClasses.hpp line 119:

> 117:   static const uint8_t _deduplication_requested_mask = 1 << 1;
> 118: 
> 119:   static int flags_offset() { CHECK_INIT(_flags_offset); }

Maybe add some description about the injected `flags` field and its contents here.

src/hotspot/share/classfile/javaClasses.hpp line 152:

> 150:   static inline void set_value(oop string, typeArrayOop buffer);
> 151: 
> 152:   // Set the no_deduplication flag true.  This flag is sticky; once set it

Initially I was a bit confused of the use of "flag" here and that the field is called "flags". Maybe use "deduplication bit" here, or rename the "flags" field but feel free to ignore.

src/hotspot/share/classfile/javaClasses.hpp line 154:

> 152:   // Set the no_deduplication flag true.  This flag is sticky; once set it
> 153:   // never gets cleared.  This is set when a string is interned in the
> 154:   // StringTable, to prevent string deduplication from changing the string's

I think this information about the contents of the injected "flags" should be collected somewhere else as piecing together what fields "flags" has from different method documentation seems wrong.

src/hotspot/share/classfile/javaClasses.hpp line 170:

> 168:   static inline bool hash_is_set(oop string);
> 169:   static inline bool is_latin1(oop java_string);
> 170:   static inline bool no_deduplication(oop java_string);

That identifier is missing a verb to read better, but I do not have a good idea. Maybe it would be easier to use the negation of "no_deduplication", something like "may_deduplicate"?
Feel free to ignore.

src/hotspot/share/classfile/javaClasses.hpp line 171:

> 169:   static inline bool is_latin1(oop java_string);
> 170:   static inline bool no_deduplication(oop java_string);
> 171:   static inline bool deduplication_requested(oop java_string);

Having a verb at the beginning like `is_deduplication_requested` sounds better.

src/hotspot/share/gc/shared/stringdedup/stringDedup.hpp line 31:

> 29: //
> 30: // String deduplication aims to reduce the heap live-set by deduplicating
> 31: // identical instances of String so that they share the same backing byte

... by deduplicating instances of (java.lang.)String with identical backing byte array (the String's value).

src/hotspot/share/gc/shared/stringdedup/stringDedup.hpp line 69:

> 67: // arrays.  This permits reclamation of arrays that become unused.  This is
> 68: // separate from the request storage objects because dead count tracking is
> 69: // used by the table implementation as part of resizing decisions and for

s/table/string table? I.e. which table is referred to here?

src/hotspot/share/gc/shared/stringdedup/stringDedup.hpp line 84:

> 82: // interned string might later become a deduplication candidate through the
> 83: // normal GC discovery mechanism.  This is handled by setting the
> 84: // no_deduplication flag in a string before interning it.  A string with

s/string/String

Maybe:

s/This is handled.../Deduplication is prevented by setting the no_deduplication flag in the String before.../

-------------

PR: https://git.openjdk.java.net/jdk/pull/3662


More information about the core-libs-dev mailing list