RFR: 8254598: StringDedupTable should use OopStorage

Wed Apr 28 06:48:09 UTC 2021

On Wed, 28 Apr 2021 00:44:19 GMT, Ioi Lam <iklam at openjdk.org> wrote:

>> Please review this change to the String Deduplication facility.  It is being
>> changed to use OopStorage to hold weak references to relevant objects,
>> rather than bespoke data structures requiring dedicated processing phases by
>> supporting GCs.
>> 
>> (The Shenandoah update was contributed by Zhengyu Gu.)
>> 
>> This change significantly simplifies the interface between a given GC and
>> the String Deduplication facility, which should make it much easier for
>> other GCs to opt in.  However, this change does not alter the set of GCs
>> providing support; currently only G1 and Shenandoah support string
>> deduplication.  Adding support by other GCs will be in followup RFEs.
>> 
>> Reviewing via the diffs might not be all that useful for some parts, as
>> several files have been essentially completely replaced, and a number of
>> files have been added or eliminated.  The full webrev might be a better
>> place to look.
>> 
>> The major changes are in gc/shared/stringdedup/* and in the supporting
>> collectors, but there are also some smaller changes in other places, most
>> notably classfile/{stringTable,javaClasses}.
>> 
>> This change is additionally labeled for review by core-libs, although there
>> are no source changes there.  This change injects a byte field of bits into
>> java.lang.String, using one of the previously unused padding bytes in that
>> class.  (There were two unused bytes, now only one.)
>> 
>> Testing:
>> mach5 tier1-2 with and without -XX:+UseStringDeduplication
>> 
>> Locally (linux-x64) ran all of the existing tests that use string
>> deduplication with both G1 and Shenandoah.  Note that
>> TestStringDeduplicationInterned.java is disabled for shenandoah, as it
>> currently fails, for reasons I haven't figured out but suspect are test
>> related.
>> 
>> - gc/stringdedup/   -- these used to be in gc/g1
>> - runtime/cds/SharedStringsDedup.java
>> - runtime/cds/appcds/cacheObject/DifferentHeapSizes.java
>> - runtime/cds/appcds/sharedStrings/*
>> 
>> shenandoah-only:
>> - gc/shenandoah/jvmti/TestHeapDump.java
>> - gc/shenandoah/TestStringDedup.java
>> - gc/shenandoah/TestStringDedupStress.java
>> 
>> Performance tested baseline, baseline + stringdedup, new with stringdedup,
>> finding no significant differences.
>
> src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp line 557:
> 
>> 555:     // non-latin1, and deduplicating if we find a match.  For deduplication we
>> 556:     // only care if the arrays consist of the same sequence of bytes.
>> 557:     const jchar* chars = static_cast<jchar*>(value->base(T_CHAR));
> 
> The encoding of the shared strings always match CompactStrings. Otherwise the CDS archive will fail to map. E.g.,
> 
> 
> $ java -XX:-CompactStrings -Xshare:dump 
> $ java -XX:+CompactStrings -Xlog:cds -version
> ...
> [0.032s][info][cds] UseSharedSpaces: The shared archive file's CompactStrings 
>                     setting (disabled) does not equal the current CompactStrings setting (enabled).
> [0.032s][info][cds] UseSharedSpaces: Unable to map shared spaces
> ...
> 
> 
> So you can avoid this `if` block when `CompactStrings == true`. The `!java_lang_String::is_latin1(found)` check below can be changed to an assert.
> 
> Also, I think we need an explicit test case where you'd return `true` at line 564. I can write a new test case and email to you. I think it will involve dumping an archive with `-XX:-CompactStrings`.

Oh, I realized that my suggestion above is wrong. When `CompactStrings==true`, you can still have a string whose coder is UTF16:

https://github.com/openjdk/jdk/blob/75a2354dc276e107d64516d20fc72bc7ef3d5f86/src/java.base/share/classes/java/lang/String.java#L343-L351

-------------

PR: https://git.openjdk.java.net/jdk/pull/3662