<i18n dev> RFR: 8306031: Update IANA Language Subtag Registry to Version 2023-04-13 [v4]

Naoto Sato naoto at openjdk.org
Thu Apr 20 17:13:50 UTC 2023


On Thu, 20 Apr 2023 06:36:47 GMT, Justin Lu <jlu at openjdk.org> wrote:

>> Update the registry and accompanying tests with the **IANA 4/13/2022** update.
>> 
>> This update introduces the case where an IANA entry can have a preferred value, but that preferred value has a preferred value as well.
>> 
>> This causes unexpected failures in JDK tests because of how locale equivalencies are created.
>> 
>> eg: `ar-ajp` has a preferred value of `ajp` but `ajp` has a preferred value of `apc`
>> 
>> Normally, when the JDK is built, _LocaleEquivlalentMaps.java_ generates the following
>> 
>> 
>> ...
>> singleEquivMap.put("ar-ajp", "ajp");
>> singleEquivMap.put("ajp", "ar-ajp");
>> ...
>> multiEquivsMap.put("ajp", new String[] {"apc", "ar-apc"});
>> multiEquivsMap.put("apc", new String[] {"ajp", "ar-apc"});
>> multiEquivsMap.put("ar-apc", new String[] {"apc", "ajp"});
>> ...
>> 
>> 
>> When `LocaleMatcher.parse(ACCEPT_LANGUAGE)` is called with `ACCEPT_LANGUAGE` containing `apc` and `ajp` in that order, the following occurs:
>> 
>> `apc` is found, `apc` is added, all of `apc's` equivalencies are added: `ajp` and `ar-apc`
>> 
>> When parse iterates to `ajp`, it finds that it is already added to the list, and does not add it's equivalency `ar-ajp`.
>> 
>> To address this, the build process must be adjusted so that the equivalencies are built as 
>> 
>> 
>> ...
>> multiEquivsMap.put("ajp", new String[] {"apc", "ar-ajp", "ar-apc"});
>> multiEquivsMap.put("apc", new String[] {"ajp", "ar-ajp", "ar-apc"});
>> multiEquivsMap.put("ar-ajp", new String[] {"apc", "ajp", "ar-apc"});
>> multiEquivsMap.put("ar-apc", new String[] {"apc", "ajp", "ar-ajp"});
>> ...
>> 
>> 
>> As, if `ar-ajp` has a preferred value of `ajp`, and `ajp` has a preferred value of `apc`, this technically means that `ar-ajp` is equivalent to `apc` and its equivalencies as well. This way, when `LocaleMatcher.parse(ACCEPT_LANGUAGE)` iterates to `apc`, it will add all of it's equivalencies including `ar-ajp`.
>
> Justin Lu has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Hash map should be initialized with numMappings

Much better. Some comments follow.

make/jdk/src/classes/build/tools/generatelsrequivmaps/EquivMapsGenerator.java line 40:

> 38: import java.util.Locale;
> 39: import java.util.Map;
> 40: import java.util.regex.Pattern;

Nit: sort the imports

make/jdk/src/classes/build/tools/generatelsrequivmaps/EquivMapsGenerator.java line 143:

> 141:                 // eg: ar-ajp has pref ajp which has pref apc
> 142:                 boolean foundInOther = false;
> 143:                 Pattern pattern = Pattern.compile("\\b"+preferred+"\\b");

I think we should explicitly find a pattern with `,` instead of the word boundary. Otherwise, it could match `-` in a tag.

make/jdk/src/classes/build/tools/generatelsrequivmaps/EquivMapsGenerator.java line 289:

> 287:                 + sortedLanguageMap2.size() + ");\n"
> 288:                 + "        regionVariantEquivMap = HashMap.newHashMap("
> 289:                 + sortedRegionVariantMap.size() + ");\n\n"

It'd be nice if these messy concatenated strings be converted to a text block, but that would be for another day.

-------------

PR Review: https://git.openjdk.org/jdk/pull/13501#pullrequestreview-1394424712
PR Review Comment: https://git.openjdk.org/jdk/pull/13501#discussion_r1172870938
PR Review Comment: https://git.openjdk.org/jdk/pull/13501#discussion_r1172876231
PR Review Comment: https://git.openjdk.org/jdk/pull/13501#discussion_r1172878602


More information about the i18n-dev mailing list