<i18n dev> RFR: 8309622: Re-examine the cache mechanism in BaseLocale [v3]

Brett Okken duke at openjdk.org
Fri Jun 16 12:49:06 UTC 2023


On Tue, 13 Jun 2023 17:56:57 GMT, Naoto Sato <naoto at openjdk.org> wrote:

>> This is stemming from the PR: https://github.com/openjdk/jdk/pull/14211 where aggressive GC can cause NPE in `BaseLocale$Key` class. I refactored the in-house cache with WeakHashMap, and removed the Key class as it is no longer needed (thus the original NPE will no longer be possible). Also with the new JMH test case, it gains some performance improvement:
>> 
>> (w/o fix)
>> 
>> Benchmark                       Mode  Cnt      Score     Error  Units
>> LocaleCache.testForLanguageTag  avgt   20   5781.275 ± 569.580  ns/op
>> LocaleCache.testLocaleOf        avgt   20  62564.079 ± 406.697  ns/op
>> 
>> (w/ fix)
>> Benchmark                       Mode  Cnt      Score     Error  Units
>> LocaleCache.testForLanguageTag  avgt   20   4801.175 ± 371.830  ns/op
>> LocaleCache.testLocaleOf        avgt   20  60394.652 ± 352.471  ns/op
>
> Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Addressing review comments

src/java.base/share/classes/sun/util/locale/BaseLocale.java line 168:

> 166:         // BaseLocale as the key. The returned "normalized" instance
> 167:         // can subsequently be used by the Locale instance which
> 168:         // guarantees the locale components are properly cased/interned.

Is it truly important that the String values are interned, or is the desire simply that all refer to the same "canonical" String instance?
my thought is that managing the canonical String instances could avoid WeakReferences and synchronized map access and just return a new BaseLocale each time.




    public static BaseLocale getInstance(String language, String script,
                                         String region, String variant) {
        if (script == null) {
            script = "";
        } else {
            script = getCanonicalString(script, LocaleUtils::toTitleString);
        }

        if (region == null) {
            region = "";
        } else {
            region = getCanonicalString(region, LocaleUtils::toUpperString);
        }

        if (language == null) {
            language = "";
        } else {
            language = getCanonicalString(language, LocaleUtils::toLowerString);
        }
        
        if (variant == null) {
            variant = "";
        } else {
            variant = getCanonicalString(variant, Function.identity());
        }
        ...
        return new BaseLocale(language, script, region, variant);
    }



    private static final ConcurrentMap<String, String> CANONICALIZED_STRINGS = new ConcurrentHashMap<>();

    static {
        // prime the old iso codes
        CANONICALIZED_STRINGS.put("he", "he");
        CANONICALIZED_STRINGS.put("iw", "iw");
        CANONICALIZED_STRINGS.put("id", "id");
        CANONICALIZED_STRINGS.put("in", "in");
        CANONICALIZED_STRINGS.put("ji", "ji");
        CANONICALIZED_STRINGS.put("yi", "yi");
    }
    
    private static String getCanonicalString(String value, Function<String, String> conversion) {
        String existing = CANONICALIZED_STRINGS.get(value);
        if (existing != null) {
            return existing;
        }

        String converted = conversion.apply(value);
        return CANONICALIZED_STRINGS.merge(converted, converted, (k,v) -> v);
    }

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14404#discussion_r1232205527


More information about the i18n-dev mailing list