RFR: 8041488: Locale-Dependent List Patterns [v7]

Naoto Sato naoto at openjdk.org
Thu Aug 10 17:34:59 UTC 2023


On Wed, 9 Aug 2023 23:39:24 GMT, Joe Wang <joehw at openjdk.org> wrote:

>> Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Small cleanup
>
> src/java.base/share/classes/java/text/ListFormat.java line 58:
> 
>> 56:  *     .format(List.of("Foo", "Bar", "Baz"))
>> 57:  * }
>> 58:  * This will produce the concatenated list string, "Foo, Bar, and Baz" as seen in
> 
> With this sample code, if the Style is changed to SHORT, it would produce the same string. Would it be better to use the weekdays instead of Foo, Bar and Baz (as in the Unicode spec)? Esp. with the UNIT type, those examples explained it better, e.g. NARROW produces 3′ 7″.
> 
> Also, if the instance is of STANDARD/SHORT, does it format List.of("January", "February", "March") and return "Jan., Feb., and Mar.", or 3 feet, 7 inches to 3 ft, 7 in? The format method states simply "Returns the string that consists of the input strings, concatenated with the patterns of this ListFormat."  I wonder if it'd be helpful to explain a bit more or add one more sample.

In fact, the sample in LDML's page seems to be incorrect. `standard-short` in English is defined as:

		<listPattern type="standard-short">
			<listPatternPart type="start">{0}, {1}</listPatternPart>
			<listPatternPart type="middle">{0}, {1}</listPatternPart>
			<listPatternPart type="end">{0}, & {1}</listPatternPart>
			<listPatternPart type="2">{0} & {1}</listPatternPart>
		</listPattern>

in `en.xml` file. So `&` is expected rather than `and`.

> Also, if the instance is of STANDARD/SHORT, does it format List.of("January", "February", "March") and return "Jan., Feb., and Mar.", or 3 feet, 7 inches to 3 ft, 7 in?

No it does not. The `format()` method does not alter the passed input strings, so it would not convert "January" to "Jan." even if `SHORT` Style is specified. I have added some extra explanations, that those patterns vary depending on the locale providers.

> src/java.base/share/classes/java/text/ListFormat.java line 71:
> 
>> 69:  * <tr><th scope="row" style="text-align:left">STANDARD</th>
>> 70:  *     <td>Foo, Bar, and Baz</td>
>> 71:  *     <td>Foo, Bar, & Baz</td>
> 
> Is "&" a typo?  It's still "and" in the Unicode spec's "standard-short" format, e.g. "Jan., Feb., and Mar."

Again `ampersand` is the correct pattern for `SHORT` in English.

> src/java.base/share/classes/java/text/ListFormat.java line 408:
> 
>> 406:         var em = endPattern.matcher(source);
>> 407:         Object parsed = null;
>> 408:         if (sm.find(parsePos.index) && em.find(parsePos.index)) {
> 
> Would it be better to call getIndex() instead? (same below)

Fixed.

> test/jdk/java/text/Format/ListFormat/TestListFormat.java line 157:
> 
>> 155:                         "foo, bar, baz", true),
>> 156:                 arguments(Locale.US, ListFormat.Type.OR, ListFormat.Style.NARROW,
>> 157:                         "foo, bar, or baz", true),
> 
> Same as in the ListFormat class, the expected results are the same "foo, bar, or baz" when different Styles are specified.

Yes, those are exactly what are defined in CLDR. (They could have chosen `|` for SHORT style, but that would be not so common in plain English I guess)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1290428317
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1290428359
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1290428267
PR Review Comment: https://git.openjdk.org/jdk/pull/15130#discussion_r1290428459


More information about the core-libs-dev mailing list