<i18n dev> RFR: 8369590: LocaleEnhanceTest has incorrectly passing test case [v2]

Naoto Sato naoto at openjdk.org
Mon Oct 13 22:29:05 UTC 2025


On Mon, 13 Oct 2025 21:51:32 GMT, Justin Lu <jlu at openjdk.org> wrote:

>> This PR corrects _test/jdk/java/util/Locale/LocaleEnhanceTest.java_, which has two test cases under `testBuilderSetLanguageTag()` which accidentally pass. One checks that Locale.setLanguageTag(String) throws ILE for duplicate extensions and the other for duplicate U-extension keys. The test cases are updated to actually test the provided code. When the test cases are fixed, they now fail.
>> 
>> Fixing the behavior to match the expectation of those test cases is consistent with the specification.
>> 
>> From `Locale.forLanguageTag(String)`,
>> 
>>> 
>>>      * <p>If the specified language tag contains any ill-formed subtags,
>>>      * the first such subtag and all following subtags are ignored.  Compare
>>>      * to {@link Locale.Builder#setLanguageTag(String)} which throws an exception
>>>      * in this case.
>> 
>> and the RFC specification
>> 
>>> Each singleton subtag MUST appear at most one time in each tag
>>>        (other than as a private use subtag).  That is, singleton subtags
>>>        MUST NOT be repeated.  For example, the tag "en-a-bbb-a-ccc" is
>>>        invalid because the subtag 'a' appears twice.
>> 
>> Since duplicate extensions (and Unicode keys/attributes) are invalid, throwing `IllformedLocaleException` in (the strict) `Locale.Builder` and ignoring in (the lenient) `Locale.forLanguageTag` for such tags would be appropriate. This PR updates the implementation as such.
>
> Justin Lu has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adding test case to confirm duplicate U-extension attributes for setExtension(char, String)

IIUC, the quote from the RFC refers to duplicate singletons. For example, it would reject something like `-u-aa-bbb-u-cc-ddd`. So I believe that rule doesn’t apply to cases like `-u-aa-bbb-AA-ccc`. I checked the `-u` extension definition in LDML but couldn’t find any description regarding duplicate keywords.

That said, I think it makes sense to allow them in lenient mode and throw an exception in strict mode. Since this would introduce a behavioral change, I’d expect it to require a CSR.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27775#issuecomment-3399242375


More information about the i18n-dev mailing list