<i18n dev> RFR: 8369590: LocaleEnhanceTest has incorrectly passing test case [v2]
Naoto Sato
naoto at openjdk.org
Mon Oct 13 22:29:05 UTC 2025
On Mon, 13 Oct 2025 21:51:32 GMT, Justin Lu <jlu at openjdk.org> wrote:
>> This PR corrects _test/jdk/java/util/Locale/LocaleEnhanceTest.java_, which has two test cases under `testBuilderSetLanguageTag()` which accidentally pass. One checks that Locale.setLanguageTag(String) throws ILE for duplicate extensions and the other for duplicate U-extension keys. The test cases are updated to actually test the provided code. When the test cases are fixed, they now fail.
>>
>> Fixing the behavior to match the expectation of those test cases is consistent with the specification.
>>
>> From `Locale.forLanguageTag(String)`,
>>
>>>
>>> * <p>If the specified language tag contains any ill-formed subtags,
>>> * the first such subtag and all following subtags are ignored. Compare
>>> * to {@link Locale.Builder#setLanguageTag(String)} which throws an exception
>>> * in this case.
>>
>> and the RFC specification
>>
>>> Each singleton subtag MUST appear at most one time in each tag
>>> (other than as a private use subtag). That is, singleton subtags
>>> MUST NOT be repeated. For example, the tag "en-a-bbb-a-ccc" is
>>> invalid because the subtag 'a' appears twice.
>>
>> Since duplicate extensions (and Unicode keys/attributes) are invalid, throwing `IllformedLocaleException` in (the strict) `Locale.Builder` and ignoring in (the lenient) `Locale.forLanguageTag` for such tags would be appropriate. This PR updates the implementation as such.
>
> Justin Lu has updated the pull request incrementally with one additional commit since the last revision:
>
> Adding test case to confirm duplicate U-extension attributes for setExtension(char, String)
IIUC, the quote from the RFC refers to duplicate singletons. For example, it would reject something like `-u-aa-bbb-u-cc-ddd`. So I believe that rule doesn’t apply to cases like `-u-aa-bbb-AA-ccc`. I checked the `-u` extension definition in LDML but couldn’t find any description regarding duplicate keywords.
That said, I think it makes sense to allow them in lenient mode and throw an exception in strict mode. Since this would introduce a behavioral change, I’d expect it to require a CSR.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/27775#issuecomment-3399242375
More information about the i18n-dev
mailing list