RFR: 8306031: Update IANA Language Subtag Registry to Version 2023-04-13 [v4]
Naoto Sato
naoto at openjdk.org
Thu Apr 20 17:13:50 UTC 2023
On Thu, 20 Apr 2023 06:36:47 GMT, Justin Lu <jlu at openjdk.org> wrote:
>> Update the registry and accompanying tests with the **IANA 4/13/2022** update.
>>
>> This update introduces the case where an IANA entry can have a preferred value, but that preferred value has a preferred value as well.
>>
>> This causes unexpected failures in JDK tests because of how locale equivalencies are created.
>>
>> eg: `ar-ajp` has a preferred value of `ajp` but `ajp` has a preferred value of `apc`
>>
>> Normally, when the JDK is built, _LocaleEquivlalentMaps.java_ generates the following
>>
>>
>> ...
>> singleEquivMap.put("ar-ajp", "ajp");
>> singleEquivMap.put("ajp", "ar-ajp");
>> ...
>> multiEquivsMap.put("ajp", new String[] {"apc", "ar-apc"});
>> multiEquivsMap.put("apc", new String[] {"ajp", "ar-apc"});
>> multiEquivsMap.put("ar-apc", new String[] {"apc", "ajp"});
>> ...
>>
>>
>> When `LocaleMatcher.parse(ACCEPT_LANGUAGE)` is called with `ACCEPT_LANGUAGE` containing `apc` and `ajp` in that order, the following occurs:
>>
>> `apc` is found, `apc` is added, all of `apc's` equivalencies are added: `ajp` and `ar-apc`
>>
>> When parse iterates to `ajp`, it finds that it is already added to the list, and does not add it's equivalency `ar-ajp`.
>>
>> To address this, the build process must be adjusted so that the equivalencies are built as
>>
>>
>> ...
>> multiEquivsMap.put("ajp", new String[] {"apc", "ar-ajp", "ar-apc"});
>> multiEquivsMap.put("apc", new String[] {"ajp", "ar-ajp", "ar-apc"});
>> multiEquivsMap.put("ar-ajp", new String[] {"apc", "ajp", "ar-apc"});
>> multiEquivsMap.put("ar-apc", new String[] {"apc", "ajp", "ar-ajp"});
>> ...
>>
>>
>> As, if `ar-ajp` has a preferred value of `ajp`, and `ajp` has a preferred value of `apc`, this technically means that `ar-ajp` is equivalent to `apc` and its equivalencies as well. This way, when `LocaleMatcher.parse(ACCEPT_LANGUAGE)` iterates to `apc`, it will add all of it's equivalencies including `ar-ajp`.
>
> Justin Lu has updated the pull request incrementally with one additional commit since the last revision:
>
> Hash map should be initialized with numMappings
Much better. Some comments follow.
make/jdk/src/classes/build/tools/generatelsrequivmaps/EquivMapsGenerator.java line 40:
> 38: import java.util.Locale;
> 39: import java.util.Map;
> 40: import java.util.regex.Pattern;
Nit: sort the imports
make/jdk/src/classes/build/tools/generatelsrequivmaps/EquivMapsGenerator.java line 143:
> 141: // eg: ar-ajp has pref ajp which has pref apc
> 142: boolean foundInOther = false;
> 143: Pattern pattern = Pattern.compile("\\b"+preferred+"\\b");
I think we should explicitly find a pattern with `,` instead of the word boundary. Otherwise, it could match `-` in a tag.
make/jdk/src/classes/build/tools/generatelsrequivmaps/EquivMapsGenerator.java line 289:
> 287: + sortedLanguageMap2.size() + ");\n"
> 288: + " regionVariantEquivMap = HashMap.newHashMap("
> 289: + sortedRegionVariantMap.size() + ");\n\n"
It'd be nice if these messy concatenated strings be converted to a text block, but that would be for another day.
-------------
PR Review: https://git.openjdk.org/jdk/pull/13501#pullrequestreview-1394424712
PR Review Comment: https://git.openjdk.org/jdk/pull/13501#discussion_r1172870938
PR Review Comment: https://git.openjdk.org/jdk/pull/13501#discussion_r1172876231
PR Review Comment: https://git.openjdk.org/jdk/pull/13501#discussion_r1172878602
More information about the core-libs-dev
mailing list